Jump to ratings and reviews
Rate this book

Distrust: Big Data, Data-Torturing, and the Assault on Science

Rate this book
There is no doubt science is currently suffering from a credibility crisis.

This thought-provoking book argues that, ironically, science's credibility is being undermined by tools created by scientists themselves. Scientific disinformation and damaging conspiracy theories are rife because of the internet that science created, the scientific demand for empirical evidence and statistical significance leads to data torturing and confirmation bias, and data mining is fuelled by the technological advances in Big Data and the development of ever-increasingly powerful
computers.

Using a wide range of entertaining examples, this fascinating book examines the impacts of society's growing distrust of science, and ultimately provides constructive suggestions for restoring the credibility of the scientific community.

337 pages, Kindle Edition

Published February 21, 2023

8 people are currently reading
120 people want to read

About the author

Gary Smith

384 books45 followers
There is more than one author with this name

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
7 (21%)
4 stars
13 (40%)
3 stars
9 (28%)
2 stars
3 (9%)
1 star
0 (0%)
Displaying 1 - 6 of 6 reviews
Profile Image for David Wineberg.
Author 2 books870 followers
April 11, 2023
“Whenever I hear about provocative research, my default assumption is that it is wrong.”

The media, from journals to social media and everything in between, are filled with lies. Gary Smith, who seems to live to debunk the fraudsters, takes them all on in Distrust. And then he settles in on scientific studies for the abuse they regularly employ to deceive, and achieve fame. Not to put too fine a point on it, but 70% of psychologists themselves don’t trust the psychology studies they read. This is Smith’s chosen world, and he has penetrated it to a remarkable – and most helpful – extent.

The book is great fun. It’s lovely to watch Smith demolish the fraud in every medium. Many of the examples will prove familiar to readers, from Russian bots to ChatGPT, passing through crypto along the way. The chapters all end not with a Conclusion, but with a short section titled The Irony. It’s usually to the effect that people don’t trust this medium, which they enjoy thanks to the tireless work of scientists who they don’t trust. With his career-long history of precision bombing the fraudsters, it is also – ironically — trustworthy. Because he suffers along with us.

The structure is simple, setting up the ease of reading the book. Smith breaks the fraud into three distinct flavors: disinformation, tortured data, and datamining. And several chapters follow each one of them straight downhill.

Disinformation is the one everyone is most confronted with. Americans are their own large part of the problem. Nearly 75% believe in the paranormal, despite all the debunking and proof against it. Every year, more Americans believe the 2020 election was actually won by Donald Trump. Meanwhile, 20-30% continue to believe the American moon landing was fake. The National UFO Reporting Center receives an average of five sightings every day – since 1974. With California being the favorite state for aliens. And most often the week of the 4th of July, just so you know. This is disinformation at work. From hackers to bots (which might actually be the majority of social media accounts), the pressure to confuse is huge. And users are nothing if not gullible.

Even scientists citing each other’s work cite the debunked studies more often than the truthful ones, he shows. Everybody loves a good story. It’s the outrageous, bogus ones that go viral. In other words, everybody’s doing it, and Science is no more saintly than hackers and trolls.

Science is getting beat up, every which way it turns. For all the inventions, services, and unprecedented living standards science has given mankind, it is regarded with ever more suspicion. Smith says: “To the extent that that scientific research is used by governments to justify policies that people dislike, science is viewed as part of the problem.” But it’s also not nearly that simple.

He has delightful stories from all kinds of sources, but he concentrates the bulk of the book on the professionals – scientists themselves. Their deceit, their trickery, their out and out fraud, is worse than garden variety internet lies, if only because science has structure, rules, checks and respect meant to prevent such things.

In the search for reasons why scientists go rogue, Smith cites Goodhart’s Law that when a measure becomes a target, it ceases to be a good measure. Think of teaching to the test, because nothing matters more than the test score. Both the teaching and the test score are severely devalued by it.

So with scientists and their p score. The p score measures statistical significance. When it is 5% or less the study has significant scientific value. Scientists therefore will do seemingly anything to hit that number. They will ignore date ranges and age categories, discount outliers – anything to have their data show the results they want, with a p value of .05. Not .051, or God forbid, .06 or more, but .05. Smith titles one chapter in the data torturing section: Squeezing Blood From Rocks.

The cheating goes to astonishing extents. Half of all drug studies are not replicable, because researchers tortured the data to fit their goal of showing efficacy of some drug (and then everyone is surprised when the drug doesn’t work as advertised, except for the massive, unexpected side effects). The vaunted journals that publish the studies often don’t even read them, and neither it appears, do the journal reviewers, the backstop of the whole validation system.

Half the journals themselves are fraudulent. They will publish anything if the authors send money. There is so much scientific publishing going on that “half of all articles are not read by anyone other than the author, journal editor, and journal reviewers.” But it goes on the résumé, and that is really all that matters in a Publish or Perish community. Scientists gaming the system have coalesced into author machines, with hundreds and thousands claiming authorship of a single paper. Smith says the author list can be several dozen pages preceding the six page paper. There aren’t enough words for each of them have contributed one.

Contrary to the contracts they sign, many researchers will refuse to give other scientists their data so the study can be repeated and verified. Unfortunately, it can take years for a fraudulent paper to be rescinded by the journal, but it seems to be happening more and more often. In the meantime, whole careers are made. TED talks are given, books published, speaking tours extended, and thousands influenced by untruths. Then, very often, after the real truth comes out, their careers and reputations suffer little or no damage and the fraudulent claims continue to circulate as facts. Is it any wonder that people distrust scientists? Smith himself says “whenever I hear about provocative research, my default assumption is that it is wrong.”

Added recently to this mix is the app, mostly thanks to “smart” watches. All kinds of software is getting pre-certified by the FDA and without randomized controlled testing. They are simply released to a gullible public. No one knows if any of them have any healthcare value, but they come with FDA implicit approval. Smith calls them digital snake oil.

He has a delightful section on the British Medical Journal’s annual Christmas Issue, which profiles the most ridiculous of the ridiculous studies. And they really are laughable. Sadly, real money was authorized to conduct them, and scientist/authors work to promote them to the scientific community. Readers will be able to see that to some audiences, they could prove believable - if they weren’t so transparently idiotic. They should make readers think twice before spreading internet facts like hurricanes with female names do more damage or that the Chinese tend to die around the fourth of the month. (These were real studies.) The great James Randi used to get in on the act, issuing Ig Nobel Prizes for the worst of the worst. That’s what has become of science. As Bob Saget said, “No, seriously, I read it. I wrote it down and I read it. It must be true. I believe everything I read!”

Datamining is the gift of computers. They are able to classify unlimited information, determine patterns between and among classifications, and spit out relationships from the data all day long. It has come to the point where scientists don’t need to have a theory they seek to prove. Just let the computers rip. The data will provide interesting correlations that will lead to a theory. That the theory is totally meaningless is of no concern. It is a publishable theory, with a huge bank of data behind it.

The relationships that datamining produces could never be replicated by human workers. There are so many classifications and variables that only a computer could evaluate and match them. Smith says there are so many that they are worthless: “If the number of true relationships yet to be discovered is limited, while the number of coincidental patterns is growing exponentially with the accumulation of more and more data, then the probability that a randomly discovered pattern is real inevitably approaches zero.” And in a lovely vicious circle, he adds “The fundamental issue is not that Internet data are flawed (which they are), but that data mining is flawed. The real problem with Internet data is that they encourage people to data mine.”

Then there is Artificial Intelligence (AI), the hot fad of the moment. Smith agrees with other authors I have read that computers, and AI, are stupid. He says and cites others claiming AI simply spouts BS all day long. The degree of intelligence demonstrated by AI is nil.

They have no way of evaluating anything. They only identify and match. With enough data behind them, they can plausibly appear intelligent. But the mistakes they make would not be made by a six year old human. They routinely misidentify objects because the angle is different, or the lighting is different, or a drawing doesn’t have the heft of a photo. Smith’s clear drawing of a child’s wagon with red and white stripes is identified, with absolute confidence, as a candy cane.

AI will answer the same question three different ways, sometimes incoherently. Because they don’t actually master a language. They classify words. They also randomize plausible answers so what they respond with doesn’t appear to be a canned script. AI is far from being able to take over the other side of a conversation. Smith has spent time working with the state of the art ChatGPT, and finds it untrustworthy, to put it mildly: “Computers are autistic savants and their stupidity makes them dangerous.”

But some of the same stupidity shows up from human observers. Internet studies show that people believe the most technical jobs are the first to go to AI. People like surgeons, who must make split second decisions and act on unexpected conditions, they believe, will be the first to be replaced by AI. Yet menial jobs like cooking and cleaning, maintenance and service will remain human domains, survey participants say. This is precisely the opposite of what will happen, as rote tasks fall to AI, while highly technical jobs like financial evaluators, genomic advisors, and surgeons, will remain human responsibilities. They might employ AI systems to aid them, but AI cannot and will not replace them.

I have read most of the stories Smith tells elsewhere. But his context, framing them in terms of disinformation, data torture and datamining, gives them new perspective. Knowing what to fear and why we hate is valuable.

David Wineberg

(Distrust, Gary Smith, March 2023)

If you liked this review, I invite you to read my book The Straight Dope. It’s an essay collection based on my first thousand reviews and what I learned. Right now it’s FREE for Prime members, otherwise — cheap! Reputed to be fascinating and a superfast read. And you already know it is well-written. https://www.amazon.com/Straight-Dope-...
Profile Image for Ali.
425 reviews
July 20, 2024
When it comes to Big Data and AI, I have a healthy skepticism but Gary Smith’s Distrust just pushed me to a level of paranoia. Smith groups the data issues in three parts: disinformation, data torturing and data mining. Living in what is called the post-truth era, we all are familiar with disinformation from (social) media, but the latter two are not apparent to a lay person. What makes it worse is Smith finds the scientists as the culprit for these two (data torturing and mining) with many case studies and published research. He sees the “publish or perish” push as the root cause of p-hacking or HARKing where researchers abuse or misuse data. As Goodhart’s law states when a measure (statistical significance) becomes a target, it ceases to be a good measure. Self correcting nature of science seems to be slow, and recent progress in AI/ML tools may amplify the problem, as quoting another economist “If you torture data enough, it will confess” Smith exemplifies big data will tell whatever you wanna hear. One will always find a correlation with selective sampling and hypothesize after results are known. AI examples are mostly on LLMs and earlier versions of ChatGPT but raised points are still valid. Smith ends with quite a few recommendations in all three areas to address the growing distrust. Looks like we all need to get more fluent in statistics and possible fallacies.
Profile Image for Brian Clegg.
Author 161 books3,163 followers
June 7, 2023
There is a lot in the news on misinformation and disinformation - Gary Smith explores the way three factors of this kind can tarnish the public's attitude to science. He suggests that there is rising distrust of science and scientists as a result of: disinformation (telling fibs), data torturing (where data is selectively used, for example choosing the time period that most emphasises the desired result) and data mining (where big data is misused by picking up on the inevitable random correlations that occur in large quantities of data without there being a causal reason for the correlation).

Smith makes the important point that in a world where we are presented with interpretations of so much data, a clear understanding of these three factors is essential if we are to make any sense of what we hear and read. While disinformation is often a problem when non-scientists present 'their truth' that is often used to attack science, data torturing and data mining is often undertaken by scientists themselves, reducing public trust in something that is essential for the functioning of modern society.

We then get a shorter section on AI, which makes the important point that most AI is not intelligent, nor is it flexible. Here he also takes on earlier versions of ChatGPT and the like and reasonably assesses their shortcomings, though he doesn't mention some of the areas where the technology is genuinely worrying, such as generating plausible student essays. But the nature of AI's intelligence or lack of it is better covered elsewhere, notably in Smith's excellent title The AI Delusion, and isn't the main thrust of this book.

Finally, Smith pulls it all together, looking at poor reproducibility, where scientists' results, for example, don't actually match the underlying data, highlighting the replication crisis, where attempts to replicate experiments fail, and delivering his solution for 'restoring the lustre of science'. The reproducibility and replication sections are excellent. The solutions are less so - but that's not really a criticism. It is just very difficult to come up with answers to these problems. There's a lot more chance with those involving scientific misdemeanours, such as the suggestion to reduce the importance given to statistical significance (as opposed to value of a result), and more importance given to quality, replicability and reproducibility - but almost inevitably the solutions on the disinformation side are much less likely to have much of an impact.

So far, so good - an important point is being made here, and though I've seen a lot in the media about the disinformation aspect, Smith does a service in making clear how easy it is to distort the interpretation of data using the other two means. Unfortunately, though there are some aspects of the book that didn't quite work for me. In part it's the way it's written, and in part a worry about the handling of a particular piece of data.

Smith has a light style despite the topic, but I sometimes found it too jaunty for a serious science book. For example, he refers to inhabitants of the UK as 'Brits'. If a UK-based writer referred to Americans as 'Yanks' in a science book I think it would rightly raise a few eyebrows. Sometimes, too, the structure doesn't quite work. For example, Smith gives us chapters on specific areas where the three factors come into play. In one of the disinformation chapters on 'Elite conspiracies', some of the subsections are on conspiracy theories driven by disinformation, but others, such as the Pentagon Papers, use of animal 'spies', and 24/7 surveillance, are about examples where the conspiracies were effectively real - however, there's no distinction made between the sections in the flow of the chapter. This doesn't work well - and this unusually unstructured approach continues through other chapters.

The factual error (entertainingly in a chapter on 'the post-fact world') was that Smith tells us 'The Sun, a UK tabloid newspaper that publishes all sorts of nonsense, has long included the disclaimer "SUN stories seek to entertain and are about the fantastic, the bizarre, and paranormal... The reader should suspend belief for the sake of enjoyment."' Unfortunately this disclaimer is taken not from The Sun, which is one of the largest circulation UK national newspapers, but from Sun, a now-defunct US supermarket tabloid. It might seem heavy-handed to point this out, but Smith claims a tool for spotting disinformation is when it looks doubtful. No one in the UK would fail to spot that this is wrong - the error demonstrates well how that ability to uncover disinformation is highly dependent on context and on your personal experience.

The concerns don't undo the fact that this is an important topic and Smith highlights issues in the areas of data torturing and data mining extremely well that haven't been exposed as much as they should be. It's a useful and timely book, but perhaps could have benefited from a more forceful editor.
Profile Image for Jacob.
230 reviews16 followers
September 17, 2023
I thought his commentary around p-hacking, data mining, misleading with statistics etc. was interesting. His discussion of AI and social media was less compelling and an area he seems to be a bit less well-versed in. Good book nonetheless.
Displaying 1 - 6 of 6 reviews

Can't find what you're looking for?

Get help and learn more about the design.