I really enjoyed listening to this audiobook, so much history that I was not aware of. Though it is framed as a history of "data," it is really a mixture of computer science history and the history of statistics (data science being a mix of those fields, among others, and generally being a term that was invented this century). The authors, professors, had trouble writing for a general audience and not a textbook, some chapters were dense with jargon and difficult listening, especially the chapters about alternative approaches to AI, a kind of dark winter in AI stretching from the 60s to the 80s or 90s (not sure). But the information was fascinating, some random facts that I'm probably remembering with partial accuracy:
- the history of statistics is intertwined with the eugenics movement and based on my recollection of that section of the book, it seems modern statistics wouldn't exist if it wasn't for eugenics and racialized thinking from the early 20th century, very cringeworthy stuff there!
- statistics started in astronomy, growing out of the need to aggregate many observations of celestial bodies, and the inherent errors caused my subpar instruments, to get closer to the "real" data. Later, statistics were developed using similar techniques, to measure abstractions. If you average out dozens of measurements of a star's movement, you are getting closer to something real, some phenomenon that objectively exists. If you average out the heights of all the males in your state, you are measuring something that doesn't really exist, an abstract concept called "average height." So basic that I'm embarrassed to say that I hadn't really thought about that much, probably since I took graduate level statistics.
- Artificial Intelligence, or AI, especially the flavor of it found in data science today, is very old and many of the techniques were developed, or at least early conceptions of them, in the 40s and 50s. In some ways the groundwork for ChatGPT and such was created in the dawn of computing. The thing that changed, in my simplified recollection of the book, is that computing power increased and humans can now process a lot more data, making those techniques more powerful. But it is not like humans have invented new techniques.
The book had very good sections on the ethics of data, how the US punted on opportunities to grapple with questions of who owns data about citizens in the 70s and really never looked back. They make the point that today's world, where huge tech companies use our data to make money via advertisers and deliver us "free" services, was not inevitable but rather a policy choices, or numerous choices and non-choices over decades.
The last point I want to mention, a political and cultural observation, is the ridiculousness of the myth of Silicon Valley - the all powerful tech demi-gods who, in their genius, have created these amazing companies from whole cloth, the great inventors of Facebook, Apple, and Google. What isn't typically included in that myth, is that the US government, and mostly the defense and intelligence industry, poured gargantuan amounts of money, year after year, decade after decade, from WWII up until present day, to develop the computing power, the statistical approaches used today, and many many approaches that didn't succeed (paths that our Silicon Valley friends didn't have to stumble down). How rich, pun intended, it is to hear the likes of Elon Musk, Peter Thiel, and others of their ilk complain about taxes, government regulation, deficits, etc. when all their riches are deeply indebted to government largess? Rich indeed.