Jump to ratings and reviews
Rate this book

How Data Happened: A History from the Age of Reason to the Age of Algorithms

Rate this book
A sweeping history of data and its technical, political, and ethical impact on our world.

From facial recognition—capable of checking us onto flights or identifying undocumented residents—to automated decision systems that inform everything from who gets loans to who receives bail, each of us moves through a world determined by data-empowered algorithms. But these technologies didn’t just appear: they are part of a history that goes back centuries, from the census enshrined in the US Constitution to the birth of eugenics in Victorian Britain to the development of Google search.

Expanding on the popular course they created at Columbia University, Chris Wiggins and Matthew L. Jones illuminate the ways in which data has long been used as a tool and a weapon in arguing for what is true, as well as a means of rearranging or defending power. By understanding the trajectory of data—where it has been and where it might yet go—Wiggins and Jones argue that we can understand how to bend it to ends that we collectively choose, with intentionality and purpose.

384 pages, Hardcover

First published March 21, 2023

223 people are currently reading
3558 people want to read

About the author

Chris Wiggins

15 books2 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
61 (14%)
4 stars
171 (39%)
3 stars
146 (33%)
2 stars
46 (10%)
1 star
6 (1%)
Displaying 1 - 30 of 71 reviews
Profile Image for Xtina.
66 reviews1 follower
April 1, 2023
Poorly written - repeats itself multiple times a few paragraphs apart, had random citations from people who are never introduced who you've never heard of, and no flow within chapters.

Bought this because of the article in the New Yorker. Was hoping the book would have the content the article lacked. I should have known better.
Profile Image for CatReader.
979 reviews162 followers
August 22, 2023
3 stars. Interesting content and very ambitious in scope (this book aims to summarize the entire field of data science!!), but hard to digest. The audiobook version clocks in at 10 hours but it took me significantly longer than usual to finish given how dense and often repetitive the writing is. I found myself repeatedly stopping after a few minutes due to tediousness or boredom.

Further reading (many of these topics are covered in blurb-like fashion in this work, so I would recommend supplemental reading if any topics in this book piqued your interest):

Historical context:
The Secret Lives of Codebreakers: The Men and Women who Cracked the Enigma Code at Bletchley Park by Sinclair McKay (2010)
Proving Ground: The Untold Story of the Six Women Who Programmed the World’s First Modern Computer by Kathy Kleiman (2022)
The Idea Factory: Bell Labs and the Great Age of American Innovation by Jon Gertner (2012)

Current topics:
The Data Detective: Ten Easy Rules to Make Sense of Statistics by Tim Harford (2020)
Invisible Women: Data Bias in a World Designed for Men by Caroline Criado Perez (2019)
The Myth of Artificial Intelligence: Why Computers Can't Think the Way We Do by Erik Larson (2021)
The Big Nine: How the Tech Titans and Their Thinking Machines Could Warp Humanity by Amy Webb (2019)
Profile Image for Iain Bertram.
30 reviews9 followers
June 27, 2023
Get the print version. The functions do not translate well to audio and the narrator doesn't know how to speak maths.
Profile Image for Rachel.
136 reviews4 followers
September 5, 2024
While I applaud Wiggins and Jones for undertaking the writing of this book, the acclamation halts there. To be sure, overviewing the history of the collection, analysis, and implementation of data certainly requires immense dedication. However, effectively doing so necessitates logical structure, which this book severely lacked. Vacillating between a chronological and topical order without demarcation or meaningfully tying the ideas together degrades both information retention and the raison d’écrire. Tiresome and unsatisfactory.

Great quote: "Few scientific claims should be viewed with more suspicion than claims to innate difference that just happen to reflect our current social arrangements. History teaches that such claims demand the highest level of vigilance about the data used, how that data is manipulated, and the inferences drawn from it. Often the lesson is simple: we know far less with certainty than many people proffering statistics and statistical inferences claim."
Profile Image for Amanda.
57 reviews17 followers
June 18, 2023
DNF. The writing and organization really interfered with what could be a fascinating subject.
149 reviews
July 15, 2023
Extremely interesting history of data, from the origins of state-istic (science of the state) to generative AI. Great research work, with the usual milestones: IQ & indian casts, p-value & Guiness beer, Tuskegee & consent, ...
Also a great reminder that AI was rule based (and so explainable) for decades, before being overuled by statistical AI.
A 4-star and not a 5-star because the last chapter is, according to me, a bit weak and boring to read.
Profile Image for Levyj93.
1 review1 follower
February 26, 2024
A decent overview, albeit written from the NYT “view from nowhere” … i.e., mainstream liberalism.

The footnotes are the highlight for me, as they point to many foundational primary sources of statistics, data science, computing etc. A great number of secondary sources/commentaries on the subject matte as well.

use the footnotes!
Profile Image for Hail Quigley.
173 reviews1 follower
April 21, 2025
Really fascinating read. I loved the discussion on big data but found the book overall a bit repetitive. The history of data was probably my favourite part however a lot of the algorithms did not translate well to audio. Overall, definitely recommend if you’re looking for something outside your normal reads.
13 reviews
November 11, 2024
Writing was too scattered, unfortunately. Probably 2.5 stars, I’m glad I at least made it to the end, with some good chapters
Profile Image for Jeremy.
21 reviews1 follower
Read
March 2, 2024
Read (most of) chapters 1-4, 11-12. Kind of long winded, but some interesting historical points on how mathematical statistics is rooted in eugenics. Would be interested in coming back to this book.
33 reviews
December 26, 2022
Reflecting on the origins of artificial intelligence, Chris Wiggins and Matthew L. Jones make an interesting observation about the field's nomenclature: "Some fields, like biology, are named after the object of study; others like calculus are named after a methodology. Artificial intelligence and machine learning, however, are named after an aspiration: the fields are defined by the goal, not the method used to get there." For most of the history of artificial intelligence, it was not clear at all which of these methods would emerge from the fray. But in the past few decades, predictive models using simple statistical techniques have largely triumphed over more complex cognitive activities like problem solving and reasoning.

In this smart and authoritative examination on the origins of data science, the authors tell the story of how this technology came to dominate our daily lives. Like anything else in human society, the story largely revolves around the profit motive. The authors argue that our current world, in which personal data is collected and analyzed almost without our consent and digital privacy seems like some quaint notion of the past, is not some historical inevitability or the natural end point of the technology, but the conscious goal of the people who created it.

The first half of the book chronicles the history of early pioneers in data science like Francis Galton and Karl Pearson. Their statistical analysis of individual attributes like intelligence, height, and criminality threatened to flatten out and elide the complexity of human differences under the supposed rubric of scientific progress. The authors call this the reification of class and race—literally making an abstract idea real. They intended here to convey the ways in which data can be abused for baleful and malign purposes, but I'm not convinced most of this section is strictly necessary to understand the primary thesis of the book. The first part could have easily been cut in half and lost none of its relevance and potency.

The story doesn't truly become compelling until the second part of the book when it covers the widespread collection and analysis of electronic data after the 1950s. Here the authors chronicle the emergence of neural networks and algorithms that feed on vast quantities of data to catalogue our habits and predilections and categorize people together. As the authors write, this system frequently serves the interests of just a few major companies rather than the very people on whom it relies for the information.

The reader may naturally wonder whether modern data science is really an effective manipulation tool or just another marketing gimmick to fool all the potential clients of this technology. The authors do conclude that "researchers have reached no consensus on the ultimate effects of this attempted manipulation," but "while adtech, either for commerce or politics, surely doesn't work in the ways those hawking it suggest, it has dramatically transformed our media landscape and consolidated a landscape of digital advertisers into a near duopoly (Facebook and Google), with unpredictable effects." Seen from this perspective, the recent debate over how to moderate these platforms is largely a misnomer; the problem is the nature of the platform itself, the structure of the attention economy, not just who's policing it.

By the end of the book, the authors provide a familiar set of solutions for anyone who has followed this debate: they suggest far more stringent antitrust enforcement, more individual control over personal data, and organizational incentives to prioritize the primacy of individual rights over sheer monetization. The authors conclude that this will have to be facilitated by the passage of better digital privacy laws, which were last updated well before the internet even existed.

Overall, this book is an illuminating treatment of an important topic with profound consequences: how data science originated as a discreet profession and now shapes our lives. It's highly recommended both for people who are unfamiliar with this subject as well as those who have some prior knowledge and want to deepen their understanding. Finally, it's important to note that I won this book from a Goodreads giveaway, for which I'm very grateful because I have a strong interest in this subject.
Profile Image for Heather Ness-Maddox.
84 reviews2 followers
May 12, 2023
I listened to this one on audible so that might be a factor. But I found the organization of this book confusing. There was a lot of jumping around in historical periods. I get that the authors wanted to organize by topic, but a linear timeline for the history of data would have made more sense. Maybe if I was physically holding a book and could have flipped back. Overall though, I enjoyed the content, especially the section on ethics.
Profile Image for Dan.
175 reviews4 followers
October 21, 2023
I really enjoyed listening to this audiobook, so much history that I was not aware of. Though it is framed as a history of "data," it is really a mixture of computer science history and the history of statistics (data science being a mix of those fields, among others, and generally being a term that was invented this century). The authors, professors, had trouble writing for a general audience and not a textbook, some chapters were dense with jargon and difficult listening, especially the chapters about alternative approaches to AI, a kind of dark winter in AI stretching from the 60s to the 80s or 90s (not sure). But the information was fascinating, some random facts that I'm probably remembering with partial accuracy:

- the history of statistics is intertwined with the eugenics movement and based on my recollection of that section of the book, it seems modern statistics wouldn't exist if it wasn't for eugenics and racialized thinking from the early 20th century, very cringeworthy stuff there!
- statistics started in astronomy, growing out of the need to aggregate many observations of celestial bodies, and the inherent errors caused my subpar instruments, to get closer to the "real" data. Later, statistics were developed using similar techniques, to measure abstractions. If you average out dozens of measurements of a star's movement, you are getting closer to something real, some phenomenon that objectively exists. If you average out the heights of all the males in your state, you are measuring something that doesn't really exist, an abstract concept called "average height." So basic that I'm embarrassed to say that I hadn't really thought about that much, probably since I took graduate level statistics.
- Artificial Intelligence, or AI, especially the flavor of it found in data science today, is very old and many of the techniques were developed, or at least early conceptions of them, in the 40s and 50s. In some ways the groundwork for ChatGPT and such was created in the dawn of computing. The thing that changed, in my simplified recollection of the book, is that computing power increased and humans can now process a lot more data, making those techniques more powerful. But it is not like humans have invented new techniques.

The book had very good sections on the ethics of data, how the US punted on opportunities to grapple with questions of who owns data about citizens in the 70s and really never looked back. They make the point that today's world, where huge tech companies use our data to make money via advertisers and deliver us "free" services, was not inevitable but rather a policy choices, or numerous choices and non-choices over decades.

The last point I want to mention, a political and cultural observation, is the ridiculousness of the myth of Silicon Valley - the all powerful tech demi-gods who, in their genius, have created these amazing companies from whole cloth, the great inventors of Facebook, Apple, and Google. What isn't typically included in that myth, is that the US government, and mostly the defense and intelligence industry, poured gargantuan amounts of money, year after year, decade after decade, from WWII up until present day, to develop the computing power, the statistical approaches used today, and many many approaches that didn't succeed (paths that our Silicon Valley friends didn't have to stumble down). How rich, pun intended, it is to hear the likes of Elon Musk, Peter Thiel, and others of their ilk complain about taxes, government regulation, deficits, etc. when all their riches are deeply indebted to government largess? Rich indeed.
111 reviews35 followers
October 19, 2023
This book is a somewhat disjointed attempt to give a history of modern data analysis and its' role in society, describing a collection of different communities which offered approaches to data analysis and trying to connect them through common themes, particularly the relation of data to state and corporate power and objectives.

It begins with a fairly standard history of early statistics and statisticians that formed the core of a modern intro to statistics class (means, variances, regression, and tests through Quetelet, Galton, Fisher, Gossett, Neyman, Pearson, and Mahalanobis), though with a more pointed discussion of social context than usually offered. Certainly, e.g., Galton and his contemporaries were not just "incidentally" racist but specifically developed statistical methodology primarily for the promotion of eugenics and the goal of "scientifically" demonstrating that some people are superior to others. They trace the rise of mathematical statistics to WWII, with Abraham Wald in the US going on to spread a mathematicized discipline with encouragement from US government funding, but have little to say about the actual content or subsequent theoretical developments, which they mostly dismiss as abstruse and not useful. They cover Bayesian statistics largely through the codebreaking work of Turing and Good at Bletchley park and subsequent interest from the intelligence community. Here and in later sections in which they emphasize classified work and the military industrial complex, the textual evidence for the external impact becomes, perhaps necessarily more meager. One could certainly tell stories about the links, like the origins of MCMC in the Manhattan project, but they don't have much to say about academic and applied Bayesian statistics outside of government, let alone its origins. In the later part, they cover a variety of communities with less historical link to statistics as a discipline, including a standard history of Artificial Intelligence starting with McCarthy and the Dartmouth conference, through expert systems, the kernel era, up through contemporary deep learning, told primarily as a story about military and intelligence agency funding priorities. For the near-contemporary era they emphasize the applied and computational turn in the rise of data science, and the fields of databases and knowledge mining up through the era of the internet and big data; for these topics they divide the discussion of influences between military and corporate, emphasizing private surveillance and advertising. It ends with a discussion of data ethics, both as an area of contestation and practice, and as a reminder of the ways in which social forces shape practice.

Overall, the picture presented in this book is more thematic than narrative, a discussion of the material as opposed to the intellectual origins of data. The book starts with contemporary critiques of data collection and analysis as used by prominent large companies and then seeks to reinforce the critiques by searching back in time for analogous historical (mis)uses. This enhances the point but is paradoxically enervating, as portraying these issues as constant companions of data analysis makes the contemporary problems seem both less severe and less tractable.
Profile Image for D J Rout.
316 reviews5 followers
February 19, 2024
This book came recommended by Patrick Wyman, and Dr Wyman is a very good reseacher who cites his sources, so I thought this book woudl be written in the same way. I'm glad to say that it is. About 30% of the book is the list of references from each chapter. I haven't read anything this well researched since Track Changes: A Literary History of Word Processing.

Simlarly, the book is written in chronological order, tracing the origins of data collection, and the various uses it has been put to over the years. It then covers various aspects of data collection as it is currently used, delves into the ethics of privacy, the use and abuse of profiling, and government and corporate responses to the collection and use of data. It even has a chapter which attempts to predict the future.

The style is not too academic, and gets a little bit folksy towards the end, but it's always accessible and if you want to look up the more academic works it cites, there's those citations at the end. It even quotes Jane Jacobs' The Nature of Economies, which gives it all the credibility I need. The folksy style isn't without its problems—one phrase that turns up multiple times is 'they turned data into a thing'. What the hell does that mean? I didn't find out in the book.

If you're ever worried about how The Powers That Be are spying on you, or that your life is in the hands of people who know you better than you know yourself, or are just annoyed by having to fill in information forms just to go to the doctors, this book gives you enough history and philosophy to talk to like-minded people intelligently.
Profile Image for Oscar.
66 reviews3 followers
June 11, 2023
A non-fiction book looking at the history of data. I had heard about it from a Tides of History episode interviewing the authors. The book loosely covers several historical periods and looks at key figures in the history of data. It covers early pioneers in the statistical movement in the industrial era, the use of statistics in the enigma machine, and the development of data tools in the postwar period into the information era.

The book does provide some background, although there is a lack of contextualisation and analysis. For example, the first chapter covers Francis Galton and his development of statistics to support his eugenic ideals. Whilst this in itself was interesting as the origin of the bell curve in statistics, there was little done to discuss or critically appraise his use of the bell curve beyond saying that he misused it. Unfortunately, this is a running theme throughout the book, with plenty of narrative, but little analysis and synthesis. The book also moves from fact to fact in a running narrative, rather than giving some time for the reader to process points. Even a bit more time spent examining why techniques were useful and some actual details would have been helpful. For example, in the chapter covering the use of techniques by Guiness to improve beer outputs, it would have been helpful to know how practice changed when the data poor environment was configured and changed.

The later chapters looking at the modern use felt like totally uncovered ground to me, and these were more interesting. However, they were hampered again by the lack of elaboration and explanation. Which, whilst admittedly may not be wholly possible with the lack of available information in recent events, could have helped me to at least understand, for example, why the ethics board at Google failed.

Overall, I felt this book had promise but it needed editing into tighter narratives focused on concepts, rather than the reciting of events.
Profile Image for Greg Talbot.
689 reviews20 followers
May 25, 2023
Data is neither good nor bad....nor neutral.


No word offers the power and currency of data. Adored reading this history of data. We come to understand how collection of basic data points for a society (mortality, suicide , havests) presented a new way to understand the world. The nature of man, the lofty theory and religious ideas of our nature and behavior have been quantified, normalized and analyzed. It's not that data gives the meaning , but has any aspect of our lived experience been untranformef by the collection of its data trails.

This book offers a really wonderful understanding of the early thinkers like Quetelet, Gosset and Pearson. Ideas around "general man", eugenics (a progressive idea at the time), and social good transfered to the academy, the government (wartime efforts of Alan Turning) and corporations. Our digital reality today, of personalized marketing tracking and attention harnessing has lead to the hyper iividualization experienced today.

If your not into math, I think you would still enjoy the conceptual ideas. Pearon's R or measure of chance. Bayes subjective prior of hypothesis testing. John Mccarthys principles of general AI...principles that stand as awe inspiring today. Claude Shannon's imagination of realized AI. This book really explores the relationship between the ideas and explorations of data.

For anyone passionate about the purpose, history or science of data. This one is for you.
1,401 reviews
May 17, 2023
Wiggins and Jones are on a difficult road to understand the changes of world of “data.” For example, “Our goal here is an actionable understanding of history” (xiii)’ Page 4 we have “…..the problem would not come from computer science.: (p. 4) And he tells us about the early 1960’s in this work. (8)

Many of the sentences come in a difficult theme to understand: “Fears of the power of algorithms to dive “computational politics”. (9) Another statement is “They constitute an intimacy in that they our interpersonal in communication our sources of news and relationships and information, and even algorithmically moderate our relations.” (10)

Why the book has a good theme, there are many pages that are difficult to understand with words that we don’t use too often. Often a paragraph takes a full page.

One of the strong themes are build about the way “communication” was used in the times of the early times of the 20th century and now.

Still, it’s good “stuff” to understand communication. But it’s not a book for a class in a class in college sophomores majors.
Profile Image for CJ.
7 reviews
April 27, 2024
As someone who works in data, I was expecting this to be the brief history I needed to learn more about my field. However, as much of the other reviews have mentioned, I found it repetitive and also overindulgent in heavy vocabulary.

While I still think it is based in research and covers a majority of the topics you would expect, the message was not as coherent as the authors meant it out to be. And therefore found it kind of lacking in enjoyment. I see what the authors were trying to do in terms of topic organization, but wish they spent more time thinking about the day-to-day reader. It felt more like I was in a lecture with a professor who consistently goes off topic only to revisit the point at sporadic moments throughout a monologue (basically how I would write a book).

Anyways, I gave it 3 stars for content, thought about bumping it down to 2, but am really glad to have a “brief history of data” resource.
Profile Image for Bob Schueler.
Author 3 books7 followers
September 15, 2024
The first third of this book is an interesting but very tough read for anyone not skilled in mathematics and abstract reasoning. I never could wrap my mind around the crucial difference between the two (or was it three) schools of thought on the nature and use of statistical analysis, but eventually the history takes over and that helps a lot. It's a very relevant and important analysis of contemporary issues that occupies much of the book, which deals with the underpinnings and contemporary issues around artificial intelligence.
Written by two Columbia U. professors based on their highly popular course, the writing isn't always as clear as it could be, not because the concepts aren't explained, but the sentence structure was awkward enough that I often had to go back after reading a long sentence to find the subject. More editing would have helped make the already challenging material more accessible to the likes of me. Still, it's a fascinating and thought-provoking read.
198 reviews
July 9, 2023
Reads like a text book, or rather more likely a book slapped-together from teachers’ notes after the course was completed. Poorly written, carelessly referenced, unintelligible to outsiders of the data analysis field. Too bad because the history of data - how we got to where we are today with unregulated big data collection, surveillance technology, automated disinformation, deep fakes and fake news, undisclosed algorithms and undecipherable AI - is important. Unfortunately, this book is not well directed to the curious, but uninitiated reader like me. Perhaps I’d be better served by taking Profs Wiggins’ and Jones’ class at Columbia, with interactive discussions and Q&A.

Profile Image for Jessi.
5,543 reviews20 followers
May 9, 2023
I liked the title of this book and thought this might be an interesting read; I work with databases every day so my life is data. There is interesting material in the book but it can be dense so sometimes takes a little time to get through. It covers a wide range of topics from Guiness to Bletchley Park to AI.
The authors do note that data can be manipulated and I wonder sometimes what the authors have done with their work but overall an interesting read.

Three and a half stars rounded up for Goodreads
Profile Image for Han Song.
29 reviews2 followers
May 23, 2023
I love this book up until chapter 9. Every concept's history and anecdotes were in chronical order, which helped me understand the development of data use.

From chapter 9, there is no main point to each subsequent chapter. All supporting ideas are not cohesive. Authors did not successfully present their opinion on each of those issues. There was a lot of redundancy and circling.

I felt it was a missed opportunity. It would be better if they emphasized machine learning and deep learning application in healthcare, transportation, etc.
Profile Image for Arianny Mercedes.
4 reviews3 followers
October 20, 2024
This book is a must-read for anyone looking to deepen their understanding of data and its historical context. It masterfully breaks down complex information, making it accessible while weaving in the history that helps readers grasp its broader impact.

The author does an exceptional job of presenting data in a digestible manner, providing insights that are not only informative but also thought-provoking. Whether you’re a data enthusiast or a history buff, this book offers a rich and engaging exploration of how past trends shape present realities.
Profile Image for James G..
445 reviews3 followers
May 20, 2025
This was a terrific and succinct history of how we view our relationship to data. Framing it all within a historical context, and with a certain sense of humor, along with a few pop, culture references, the book brought up a lot of really big subjects in a way that was digestible and memorable.I may have to purchase the hardcopy to be able to pull some of the quotes. Extremely useful, especially for some of the things that I have going on at work, including a major project this coming September that utilizes AI as a tool.
Profile Image for Yoo Chung.
1 review
August 23, 2023
In large part, this is a history of how data collection, statistics, and data analysis developed from their humble beginnings as in ancient times to today's colossal computing infrastructure. The last part is rather disappointing, however, in that it feels mostly like a laundry list of criticisms about modern data collection and usage, especially with its almost complete focus on large tech companies that are already well known and not much about the much wider universe of data today.
63 reviews1 follower
April 4, 2024
Interesting book. Definitely worth reading.

While the writing could have been better and less repetitive, the perspective and overview of the book was excellent. The book does a great job explaining that data is not neutral and that there is a power-dynamic associated with data (as well as the technical, error and bias concerns).

The book will help the reader analyze the choices and trade-offs that we will be making with AI and Data. I learned a lot.

Profile Image for Lana.
226 reviews
May 10, 2024
This was a very informative book and I found a lot of its content very interesting. I will say having listened to the audiobook that there was one chapter very math-heavy which did not translate well with how the narrator was reading the equations; it was very clear that the narrator was not versed in math. Overall this book made me see some things from a different perspective and made me think about a lot of things I wouldn't have otherwise thought of.
Displaying 1 - 30 of 71 reviews

Can't find what you're looking for?

Get help and learn more about the design.