After working at two corporations where the “merry marketeers” (one of the nicer nicknames for the headstrong, ambitious marketing people who wouldn’t listen to anyone else) twisted statistics into contortions that would have challenged Marvel’s Mr. Fantastic or D.C.’s Plastic Man, I resolved to go back and pay more attention to statistics. Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics accomplishes what I had set out to do much better than I could have accomplished it. Between my challenges of marketing assumptions and my desire to teach my students (when researching virtual communities) to avoid many of these errors, I wish I’d had this book decades earlier. It would have sharpened my arguments and my presentations.
One gets a sense of where the volume is going from early on when author Gary Smith quotes Ronald Coase’s cynical comment: “If you torture the data long enough, it will confess.” (p. 19) I’d heard the quotation, but never knew the source. Then, after explaining the basis for our genetic predilection for pattern recognition (whether significant or not), the author states the problem with over relying on such an instinct: “Our inherited desire to explain what we see fuels two kinds of cognitive errors. First, we are too easily seduced by patterns and by the theories that explain them. Second, we latch onto data that support our theories and discount contradicting evidence.” (p. 21). Late in the book, Smith reiterates his warning: “Data without theory is alluring, but misleading.” (p. 307).
The beauty of the book is that it skewers the poor to abominable research practices in multiple fields: medicine, economics, sociology, parapsychology, military strategy, and even web design. His examples are derived from historical studies and modern studies, and his conclusions are clear. One’s mental alarm quickly signals what your visceral reaction to conflicting studies (particularly medical and psychological/sociological) over the years, to watch out. For example, he cites John Ionnadis of the Stanford University School of Medicine who: “looked at 45 of the most widely respected medical studies during the years 1990 through 2003 that claimed to have demonstrated effective treatments for various ailments. In only 34 cases were attempts made to replicate the original test results with larger samples. The initial results were confirmed in 20 of these 34 cases (59 percent). For seven treatments, the benefits were much smaller than initially estimated; for the other seven treatments, there were no benefits at all.” (p. 31)
In demonstrating problems, Smith cites: self-selection bias (p. 40), survivor bias (particularly in backward-looking studies—p. 56), confirmation bias (p. 68), deceptive graphing (“The ups and downs in the data can be exaggerated by putting a small range of numbers on the axis and can be dampened by putting a large range of numbers on the axis.”--p. 106), the false-positive problem (p. 136), confounding factors (p. 145), regression to the mean (p. 180), trust in the fallacious law of averages (p. 203), starting with data clusters rather than researching a theory (p. 213, otherwise known as the Texas sharpshooter approach – p. 215), be careful in discarding outliers (sometimes, you shouldn’t – p. 234), the selective reporting or “publication effect” (p. 273), use of forward or backward displacement to create a multiplicity of potentialities (p. 274), and extrapolation of past trends into future expectations (p. 288) seem to be the major pitfalls. In addition to these, Smith simply warned: “the two main pitfalls of quantitative financial analysis: a naive confidence that historical patterns are a reliable guide to the future, and a dependence on theoretical assumptions that are mathematically convenient but dangerously unrealistic.” (p. 331).
One of Smith’s most interesting examples to me (since I enjoy military history) was reading about a WWII effort to study damaged planes so that they would have a statistical basis for placing additional (albeit heavy) armor on vulnerable areas. Yet, the self-selecting fallacy in this study was that planes hit in the most vulnerable places like the cockpit and fuel lines didn’t usually make it home. Thanks to an analysis by Andrew Wald, authorities discovered: “Returning planes were more likely to have holes in the wings because these holes did little damage. Wald’s advice was exactly the opposite of the initial conclusion. Instead of reinforcing the locations with the most holes, they should reinforce the locations with no holes. It worked. Far fewer planes were shot down and far more returned safely, ready to continue the war effort. Wald’s clear thinking helped win the war.” (pp. 51-52)
As a former resident of California’s Temecula Valley, I had studied some of the early urban planning studies of the area for my first book, The SimCity Planning Commission Handbook. So, I was very intrigued to read about how a mining company intended to ruin the “Rainbow Gap” (the secret to the valley’s cooling breeze which allowed for a respectable wine industry to grow there) and how they abused statistics to try to say their plan would have no environmental/economic negatives. For example, the mining company’s consultants argued that prices for homes next to their other mines still went up the same percentage as those miles away from one of their quarries. They didn’t compare values; they compared percentages (even though values had decreased by 20% since the quarry went in (p. 69).
As an inveterate coffee drinker, I was fascinated by both the historical reference to Gustav the Great’s experiment regarding coffee and two identical twins (convicted felons) (p. 154). Would you be surprised to realize that part of the problem with the conclusion was the small sample size and ignorance of confounding factors? Of course, Smith goes on to cite later studies which made similar errors because: “We consistently underestimate the role of chance in our lives, failing to recognize that randomness can generate patterns that appear to be meaningful, but are, in fact, meaningless.
We are too easily seduced by explanations for the inexplicable.” (p. 164)
If the result of ignoring the outliers on O-ring tests at NASA hadn’t been the deaths of the Challenger crew, it would have been amusing to read that Nobel prize-winner Richard Feynman demonstrated the significance of the cold-weather tests: “During nationally televised hearings, this Nobel Prize–winning theoretical physicist demonstrated that O-rings lost their resiliency at low temperatures simply by dunking one in a glass of ice water.” (pp. 237-238) Also, having dodged a bullet myself and observing the pain of those around me, I don’t find much amusement in the so-called “dot-bomb” era, but Smith does a solid job of explaining the misplaced logic behind it (pp. 311-314). I thought his summary in this regard was quite clever, too: “In carpentry, they say, ’Measure twice, cut once.’ With data, ’Think twice, calculate once.’” (p. 324).
In the final chapter, Smith recapitulates the errors he has delineated through the book. He reminds his readers: “Before you double-check someone’s arithmetic, double-check their reasoning.” (p. 360). And, though he is specifically addressing the “Texas sharpshooter” fallacy, there is general wisdom when he writes: “It is easy to find a theory that fits the data if the data are used to invent the theory.” (p. 363) Frankly, if more people read Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics and applied its lessons, there would be a lot less ridiculous bestselling books on getting rich quick and many less gullible people falling for every dubious “scientific” study. This was not only a joy to read; it will be a book I return to on numerous occasions.