Rate this book

Statistics Done Wrong: The Woefully Complete Guide

Name: Statistics Done Wrong: The Woefully Complete Guide
Rating: 4.19 (120 reviews)
ISBN: 9781593276737

Alex Reinhart

Rate this book

Scientific progress depends on good research, and good research needs good statistics. But statistical analysis is tricky to get right, even for the best and brightest of us. You'd be surprised how many scientists are doing it wrong.Statistics Done Wrong is a pithy, essential guide to statistical blunders in modern science that will show you how to keep your research blunder-free. You'll examine embarrassing errors and omissions in recent research, learn about the misconceptions and scientific politics that allow these mistakes to happen, and begin your quest to reform the way you and your peers do statistics.You'll find advice –Asking the right question, designing the right experiment, choosing the right statistical analysis, and sticking to the plan–How to think about p values, significance, insignificance, confidence intervals, and regression–Choosing the right sample size and avoiding false positives–Reporting your analysis and publishing your data and source code–Procedures to follow, precautions to take, and analytical software that can Read this concise, powerful guide to help you produce statistically sound research. Give this book to everyone you know.The first step toward statistics done right is Statistics Done Wrong.

GenresScienceNonfictionMathematicsBusinessTechnologyEconomicsPsychology

227 pages, Kindle Edition

First published April 27, 2013

330 people are currently reading

3190 people want to read

About the author

Alex Reinhart

1 book22 followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

432 (40%)

4 stars

450 (41%)

3 stars

170 (15%)

2 stars

25 (2%)

1 star

2 (<1%)

Displaying 1 - 30 of 120 reviews

Always Pouting

576 reviews1,024 followers

June 6, 2019

Trying to ease myself back into statistics and this felt like a good way to do so. Especially felt relevant since it centers the conversation around scientific publishing and the misuse of statistics in a lot of research papers. I actually found out about some places where scientists can publish there data sets and I hadn't know about that and now I'm super excited to take a look and see if there's anything I can play around with there. The material covered is pretty accessible and it's a good read for people who read or do research to get a better understanding of what to look out for/what better usage of statistics would look like. Not really a good choice to pick up if you're just trying to learn statistics which the book lets you know right away. It does include resources though for people trying to learn statistics and get more comfortable with, some of which I hope to get to at some point.

Philipp

699 reviews224 followers

May 21, 2015

You could say this is a mix of Motulsky's Intuitive Biostatistics and Goldacre's essays. The first half of Statistics Done Wrong are plain English essays on various problems encountered in modern science related to statistics, problems which crop up again and again, such as the multiple comparison problem, over-reliance on p-values, etc. (similar to Motulsky Reinhart prefers 95% Confidence Intervals). The second half focuses more on reproducibility, statistical fishing etc.

It's a very well-written short overview of the most egregious errors in science, so I think it's a good fit for working scientists interested in improving their statistical analyses. It won't make you a statistician, for that it's too short.

mathematics

Michael

163 reviews73 followers

June 4, 2016

Let me preface this review by saying that if you're looking for a book to learn statistics from, this is not it. The author assumes a certain knowledge on the subject matter and unless you have that, you probably won't get much out of this text as explanations are a bit on the terse side (though heavily referenced for additional reading).

So who is this book for then? Everyone who works with statistics and/or data analytics, and wants to get a handle on some of the most common mistakes and fallacies committed in the field, whether knowingly or unknowingly. Like mentioned before the style can be a bit terse, and I think occasionally chapters could have benefitted from slightly more background on the presented concepts, especially since this book is marketed as a "complete guide". I nonetheless consider it a good resource for people as myself, who mainly picked up their statistical knowledge in relation to their main interest, i.e. for machine learning or bioinformatics. If you feel like you have at decent handle on basic statistics, but wouldn't trust yourself to set up your own analysis or experiments, you'll certainly gain something from "Statistics Done Wrong".

On a stylistic note, I have to say that for a book on statistics, this has been a surprisingly entertaining read and the author deserves some bonus points for pointing out the irony of using published studies and papers to point out fallacies in other studies and papers.

If you are an experienced statistician you probably can give this one a pass, but if you want to freshen up or add to your existing basic statistics knowledge, this is a very enjoyable book.

math

Gina

2,066 reviews68 followers

January 22, 2018

I'm a sociologist who's taken several statistics course in both undergrad and grad school, have worked at a research center, and have taught research methods at the undergraduate level. I tell you all that because you need to understand statistics is one of my particular flavors of nerd. I find it infinitely frustrating when a student will find a peer reviewed, scientific article on which to base their position only to dismiss the multiple peer reviewed, scientific articles published later which question the earlier research findings. It is for this reason, this quote particularly caught my attention: "Misconceptions are like cockroaches: you have no idea where they came from, but they’re everywhere—often where you don’t expect them—and they’re impervious to nuclear weapons."

Despite the cartoonish cover, Reinhart seeks to seriously critically examine much of the published statistical analysis, particularly in the medical field among others, in several key areas: lack of education in statistics leading to misinterpretation of findings, bias - both intentional and unintentional, statistically significant findings v. practically significant findings, fraud, data set errors. There is much math within, but I found it well explained and approachable for even those with no prior statistics knowledge. I thought the discussions of regression analysis, p values, and confidence intervals were especially well done, and the examples he uses are interesting. I highly recommend this as a beginner's guide to anyone considering statistical research in any field.

non-fiction recommend-to-others

Ari

782 reviews91 followers

October 25, 2016

I liked this. It's a short, straightforward, and clear look at a variety of bad statistical practices. It won't tell you how to do a regression or a hypothesis test but it will discuss which to use. The narrative is clear and straightforward, and readily readable to anybody with a moderate mathematical or technical background.

It's mostly stuff I think I already knew, but it was helpful to have it systematically and clearly presented.

The author is a CMU statistics grad student with a physics background; despite this, the examples of mistakes tend to be drawn widely from the social and biological sciences, especially medicine. My sense is that this is necessary -- we need statistics a lot more in the life and social sciences than in the physical sciences. In physical science, we can typically scale the experiment up to the point where statistical error is insignificant; in the life or social sciences, often experimental size is more narrowly limited.

casual-nonfiction owned

Bastian Greshake Tzovaras

155 reviews91 followers

May 28, 2015

If you haven't had a good introduction into statistics: This might just be what you're looking for. Explains all the honest mistakes (and evil hacks) you can make while analysing data. If you're already familiar with stats it still might be a nice book to refresh your knowledge (and laugh a lot, because it's written very well).

Anysha

86 reviews47 followers

February 20, 2017

Užitečná kniha upozorňující na nešvary v designu dnešních výzkumů - dezinterpretace p hodnot, pseudoreplikace, nedostatečná statistická síla, publication bias... Autor je vtipný, aktuální a přidává řadu praktických tipů (např. na datová úložiště nebo stránky, kde je možné provést preregistraci vašeho výzkumu).

Nathan Albright

4,488 reviews159 followers

June 16, 2021

The problem with this book, such as it is, is that it by no means a woefully complete guide. To be sure, the state of statistics abuse in contemporary society is rather woeful, and this book demonstrates that a great many people, including those who engage in data analysis as a profession, lack a fundamental understanding of the terminology and meaning of the field they work in, have a terrible understanding of statistics. The author, though, does not go the direction that one would expect, and he deserves a great deal of credit for his restraint. It would be easy to direct an awakening evidence of the terrible knowledge and mistaken use of statistics principles and practices in scientific fields into a feeling of snobbery towards those who know less than even the reader, but the author does not want to do that. What he is trying to do is more complicated, and that is revealing the sad state of statistics knowledge even in many well-respected and well-regarded places, while at the same time trying to let the reader think that things are doing better and that one need not treat anything that seeks to use data to make a recommendation as being suspect, even if things do often appear that way.

This book is a short one at a bit less than 150 pages. It begins with a preface, acknowledgements, and an introduction. This is followed by an introduction to statistical significance (1), including confidence intervals. The author then discusses statistical power and the frequency of underpowered statistics (2), something that appears not to be well recognized. After this comes a look at pseudoreplication and the importance of choosing one's data correctly (3). The author discusses the problems of p value and base rate fallacies (4), how people are bad judges of significance (5), how people regularly double-dip in the data and engage in circular reasoning (6), and problems of continuity errors (7). There are chapters on such matters as model abuse (8), researcher freedom and its pitfalls (9), the fact that everybody makes mistakes with data (10), and ways that data are hidden in ways that hinder our ability to understand what is going on (11). The author closes with a chapter on what can be done about this (12), as well as notes and an index.

I am not sure that I ultimately buy what the author is trying to sell. It is not as if statistical knowledge is too difficult for people to attain to. The author, after all, expects the reader to understand what he is saying, at least from a conceptual level. The author, also, it appears, wishes to preserve the prestige of certain gatekeepers within the scientific community whose legitimacy would be undermined if one takes a position of extreme skepticism relating to the use of statistical inferences and reasoning. Yet the author's discussion of the characteristic flaws of how people tend to use statistics is highly damning when it comes to large areas of the world where people try to argue based on studies. Problems like confirmation bias are something that all of us are prone to, and to the extent that we are aware of our own vulnerabilities when it comes to sound reasoning, we can also be properly skeptical of others when it comes to their own attempts to engage in such reasoning where they have a motive and plenty of opportunity to be less than honest.

48 reviews1 follower

Grrrrrrreat 🐅

2,366 reviews99 followers

March 9, 2019

The intent of Statistics Done Wrong by Alex Reinhart is to foster a good statistical method from scientists and laymen. Mr. Reinhart doesn’t demonstrate how to calculate these items himself, but he does show how to avoid both the most egregious errors and the most subtle mistakes that people perform when using statistics to prove things. This is an important task since Statistics has a bad rap for aiding people in lies. However, Statistics is a tool and not a magic bullet. There are multiple ways to run equations and regressions on data, so choosing the right operation to perform is imperative.

The book is written in a style that doesn’t expect much prior knowledge in Statistics. This is fortunate since Statistics is complicated and has a ton of terms for various things. There are p-values and chi-square regressions and all sorts of analyses that one can do to data. The book is split into 11 chapters with each chapter discussing something that can go wrong in the wonderful world of Statistics. It utilizes examples from real scientific studies.

The book does mention the most important aspects of Statistics, but it is not filled to the brim with formulae; this is the main reason that the layman can understand this book. There is one unusual part about this book though, it is shorter than I thought it would be. While the book looks like it should be around 20-30 pages longer than it is, it is only 128 pages of content. This is because the book uses really sturdy paper.

Other than that small detail this book is really well done. It is informative and has a bit of a sense of humor to it.

mathematics non-fiction

Unwisely

1,503 reviews15 followers

February 20, 2018

A quick read and entertaining. I think I learned some things (although I honestly think I'm more confused about some things after this). Worthwhile for the curious, I guess, although I am not sure how it compares to other books on the topic. (I will say the examples of Simpson's Paradox were the exact same two used in a video I watched about that topic recently. I assume this has happened more than twice, but those must be the most famous examples, because that was weird.)

I consider myself reasonably statistically sophisticated, but he still managed to come up with a couple of new concepts for me (statistical power, for example, which seems like it should be something I've heard of before) as well as mistakes I didn't know you could make. (Luckily I don't think I'm making any of them.)

At the end it gives a link for "updates, errata, and other information". I looked, and to save you the trouble, there are only two minor corrections. Also some glowing reviews, although no updates that I could find.

2018 non-fiction

Trang

228 reviews

January 31, 2019

This was a decent short read about poor practices in conducting research and reporting results, especially in the medical & neuroscience domains. Some of the examples cited were especially troubling:
- "If you administer questions like this one [a typical question about base rate fallacy] to statistics students and scientific methodology instructors, more than a third fail. If you ask doctors, two thirds fail." Yikes
- "of the top-cited research articles in medicine, a quarter have gone untested after their publication, and a third have been found to be exaggerated or wrong by later research." Double Yikes

There were some parts of the book that would probably be unclear without basic stats background (most notably the explanations in Multiple Comparisons), while some other basic concepts were explained in a somewhat lengthy way (e.g. standard deviation -- I would have preferred a concise equation).

This book is available online: https://www.statisticsdonewrong.com/i...

Degenerate Chemist

931 reviews50 followers

January 26, 2022

This book is an excellent little collection of essays on the misuse of statistics in scientific disciplines. It is the first book I have ever read that has been able to articulate my frustrations with p-values. I do think that it brings up one big important point- that so few scientists have the statistical knowledge necessary to process the mountains of data they collect. I didn't really have any statistical background until I trained as a QE.

The author states in his introduction that he wants this book to be accessible to the general reader and I don't think he is entirely successful in that regard. I wouldn't want to read this without a general understanding of stats.

Overall, this book was incredibly helpful and I will be using it as a desk reference at work.

qe-stuff

Manu

18 reviews2 followers

September 19, 2020

Clearly elucidates the basic concepts for:
1. Normal distribution
2. How to use p value with confidence intervals?
3. When to use p value?
4. How to tread between the fine line of using deceptive statistics vs reading the actual impact

A lot of strategies in the organisation is built seeing the co-related data but there is never an attempt to find the causation. This book highlights how we can do so with the examples in Pharma R&D industry.

Overall, it is a good read and a highly recommendable one.

Julia

549 reviews27 followers

January 17, 2020

If you have trust issues, don't read the book. It takes don't trust statistics that you haven't falsified yourself to a whole new level. How can we trust doctor's when their knowledge is based on false statistics? Better not think too much about it...

non-fiction

Jordan Peacock

62 reviews51 followers

March 16, 2015

God, this was depressing. Bitter pill, but better to swallow it now.

Clintweathers

31 reviews2 followers

September 20, 2015

It was good.

Very much in the same vein as How Not To Play Chess by Znofsko-Borofsky.

Also very much aimed at the biostatistics realm, but applicable to everyone who does data work.

Michael

184 reviews3 followers

March 15, 2018

I should have taken more math in college. Great book.

Woflmao

145 reviews15 followers

January 23, 2022

Reinhart's book on statistical mischief (deliberate or not) aims to create awareness and understanding of statistical fallacies and abuses without going into the mathematical definitions or details of how the statistical methods investigated in this book work. Instead, when he covers several methods on how to find determine statistical significance, he describes them by their effects, their strengths and weaknesses and in what situations they can be (ab)used. The target audience are scientists or science students with little knowledge or inclination of statistics.
With this approach, I would say Reinhart's book is a success. You do need to read it carefully, though, and process what you read. Just skimming through it will render it worthless. Probably the best way to read it is in parallel to taking a first course on statistical methods.
Beyond the mere problems with statistical methods, the book also discusses problems that arise from the publishing practices in academia, for example a strong bias towards studies that claim to show a statistically significant effect over studies that do not show a connection between the data. In the final chapter, there are some suggestions on how to improve publication practices and statistics teaching, which are good but unfortunately seem hard to implement.

reviewed science

Shelly (YI-Hsuan) LIN

26 reviews3 followers

December 14, 2020

Good overview of common mistakes scientific research papers did with misleading statistics results. Good outlines of things we should be careful about while conducting our own research. Good and concise book to read.

statistics

Raghoonandh

33 reviews2 followers

January 17, 2020

A very brief summary of bloopers in statistics.

Michael

77 reviews

February 14, 2024

3.5. Good content but pretty high level.

tryagainlater

Lloyd Downey

751 reviews

December 22, 2023

I’m currently reading this book. But find my head swimming….. I keep wondering about all the misinterpreted stats that I’ve used in the past. And realising, that I never had any concept of the Power of a test.(though guess I was always aware that more samples tended to give more accurate results). The power for any hypothesis test is the probability that it will yield a statistically significant outcome (defined in this example as p < 0.05). A fair coin will show between 40 and 60 heads in 95% of trials, so for an unfair coin, the power is the probability of a result outside this range of 40-60 heads. The power is affected by three factors:
1. The size of the bias you're looking for. A huge bias is much easier to detect than a tiny one.
2. The sample size. By collecting more data (more coin flips), you can more easily detect small biases.
3. Measurement error. It's easy to count coin flips, but many experiments deal with values that are harder to measure, such as medical studies investigating symptoms of fatigue or depression.
Though with all the fine tuning, I’m reminded of what Dick Jackson…. my Agricultural Botany lecturer said to us once. “If you need to use statistics to see if something works, then the effect in the field is unlikely to make much difference”. And combine this with the “fact” that when experimental findings are applied to general farm practice, they only deliver about 60% of the original outcomes. (Can’t remember if it was 60% exactly… but something like that).
He gives some good tips. For example: Ensure that your statistical analysis really answers your research question. Additional measurements that are highly dependent on previous data do not prove that your results generalize to a wider population —they merely increase your certainty about the specific sample you studied.

One thing that really made an impact on me was the diagram on p40 where he shows an example where ten drugs actually work (out of 100). In his experiment he gets statistically significant results for 13 but 5 of them are false positives. So the chance of any of his "working" drugs being truly effective is just 8 in 13 or 62%. And the false discovery rate is 38%.
Each square in the grid represents one drug. In reality, only the 10 drugs in the top row work. Because most trials can't perfectly detect every good medication, he assumes his tests have a statistical power of 0.8, though you know that most studies have much lower power.
So of the 10 good drugs, he'll correctly detect around 8 of them.
Because his p value threshold is 0.05, he has a 5% chance of falsely concluding that an ineffective drug works. Since 90 of his tested drugs are ineffective, this means he'll conclude that about 5 of them have significant effects.
He performs his experiments and concludes there are 13 "working" drugs: 8 good drugs and 5 false positives. The chance of any given "working" drug being truly effective is therefore 8 in 13-just 62%! In statistical terms, his false discovery rate—the fraction of statistically significant results that are really false positives—is 38%. to my chagrin I probably would have been quite happy to conclude that the 13 I'd discovered were all working drugs!
So when someone cites a low p value to say their study is probably right, remember that the probability of error is actually almost certainly higher.
Anyway, lots of stuff like this. And a number of interesting observations such as :
If I test 20 jelly bean flavors that do not cause acne at all and look for a correlation at p < 0.05 significance, I have a 64% chance of getting at least one false positive result. If I test 45 flavors, the chance of at least one false positive is as high as 90%.

Don't just torture the data until it confesses. Have a specific statistical hypothesis in mind before you begin your analysis.

There are lots of issues with published experimental data: Ideally, the steps in a published analysis would be reproducible: fully automated, with the computer source code available for inspection as a definitive record of the work. Errors would be easy to spot and correct, and any scientist could download the dataset and code and produce exactly the same results. Even better, the code would be combined with a description of its purpose. Statistical software has been advancing to make this possible. But data "decays". One study of 516 articles published between 1991 and 2011 found that the probability of data being available decayed over time. For papers more than 20 years old, fewer than half of datasets were available. The Dryad Digital Repository, partners with scientific journals to allow authors to deposit data during the article submission process and encourages authors to cite data they have relied on. Dryad promises to convert files to new formats as older formats become obsolete, preventing data from fading into obscurity as programs lose the ability to read it.

Another issue is that many of the scientists omit results. The review board filings listed outcomes that would be measured by each study: side-effect rates, patient-reported symptoms, and so on.....Statistically significant changes in these outcomes were usually reported in published papers, but statistically insignificant results were omitted, as though the researchers had never measured them. A similar review of 12 antidepressants found that of studies submitted to the United States Food and Drug Administration during the approval process, the vast majority of negative results were never published or, less frequently, were published to emphasize secondary outcomes.

It is possible to test for publication and outcome reporting bias. The test has been used to discover worrisome bias in the publication of neurological studies of animal experimentation.? Animal testing is ethically justified on the basis of its benefits to the progress of science and medicine, but evidence of strong outcome reporting bias suggests that many animals have been used in studies that went unpublished, adding nothing to the scientific record.

He has some interesting material on how students learn best. Lectures do not suit how students learn. Students have preconceptions about basic physics, from their everyday experience—for example, everyone "knows" that something pushed will eventually come to a stop because every object in the real world does so. But we teach Newton's first law, in which an object in motion stays in motion unless acted upon by an outside force, and expect students to immediately replace their preconception with the new understanding that objects stop only because of frictional forces. Interviews of physics students have revealed numerous surprising misconceptions developed during introductory courses, many not anticipated by instructors. If lectures do not force students to confront and correct their misconceptions, we will have to use a method that does. A leading example is peer instruction. Students are assigned readings or videos before class, and class time is spent reviewing the basic concepts and answering conceptual questions. Forced to choose an answer and discuss why they believe it is true before the instructor reveals the correct answer students immediately see when their misconceptions do not match reality, and instructors spot problems before they grow. Peer instruction has been successfully implemented in many physics courses. Surveys using the Force Concept Inventory found that students typically double or triple their learning gains in a peer instruction course, filling in 50% to 75% of the gaps in their knowledge revealed at the beginning of the semester.13,19,20 And despite the focus on conceptual under-standing, students in peer instruction courses perform just as well or better on quantitative and mathematical questions as their lectured peers.

When I bought this book I thought it would be an update of Darrell Huff's book...."How to lie with Statistics". But as Reinhart says: "How to Lie with Statistics" didn't focus on statistics in the academic sense of the term—it was perhaps better titled How to Lie with Charts, Plots, and Misleading Numbers-the book was still widely adopted in college courses. And, I must confess that I have forever been grateful and always look for gaps in the scale or log transformations or percentages compared with concrete numbers, etc.

Anyway, the book is really interesting to me in that it demonstrates that many (maybe most) scientists don’t know enough stats to correctly design experiments and use their data. And it has certainly made me realise that I am much weaker in Statistics (despite scoring a High Distinction in my final year of Stats and using lots of stats in my work) than i had thought. I have recommended this book to a friend who is a lecturer in Statistics and he's thinking of using some of it in his lectures. Five stars from me.

decision-making mathematics statistics

Sophia

232 reviews110 followers

May 11, 2019

This is not a "complete" anything. It's a few chapters walking through a few basic concepts of statistics that people often ignore, misunderstand or don't even know about. If you're a psychology graduate, this will all be old news to you, although a refresher in statistics never hurts. If, on the other hand, you have to deal with statistics quite often but aren't really on solid grounds, a book like this is a pretty good idea.

The main value of this book I think are all the anecdotes of statistics done wrong, since these stick in your memory, and drive home the point of how unreliable some reported "statistics" really are. There are also a few good suggestions on what you can do differently to avoid some of these pitfalls, but it's not really sufficient to put any changes in practice.

The major fault with this book is that it's actually pretty bad at explaining things. In a certain sense you need to already be familiar with some statistics concepts, and even then, sometimes his explanations leave you more confused then enlightened. Case in point, the definition of p-values. He states that a p-value is "the probability, under the assumption that there is no true effect or difference, of collecting data that shows a difference equal to or greater than the one observed." Then he provides a series of quiz true-or-false questions on what it means to have a p-value of .01, and surprise! they're all false. I spent at least an hour trying to figure out why some of them were false, and I'm not really sure I succeeded. The author, like many other statistics people, don't seem to have spent the time trying to identify where it is that students go wrong when they misunderstand a statistics concept, but just raise their hands in despair, despite the fact that all students are making roughly the same mistakes, meaning we all have the same misunderstanding that needs correcting. In this example, the statement that tripped me up the most as false was "there is a 1% probability that the null hypothesis is true", and "if you were to replicate the experiment 100 times, you would obtain a significant result in 99% of trials". The author was no help in explaining why these were wrong, and I'm still trying to figure it out, but the key factor seems to be the specification "under the assumption there is no effect" (thank you, wikipedia). Given that these two statements are wrong, it raises an interesting problem, because I think the reason scientists like p-values in the first place is because they assume it indicates their probability of being wrong, the probability that their results were due to chance. If this is not so, all the more reason to abandon the p-value.

In sum, this is not the most helpful book in dealing with the murky waters of statistics, but it's better than nothing.

academic-stuff

briz

Author 6 books76 followers

February 10, 2019

A brief, lovely, vaguely horrifying overview of how endemic "bad statistics" is. This is mostly pitched to the statistics practitioner - and especially one coming from academia. In other words, this would've been catnip to me like ~5 years ago. But, for now, having already cleansed myself in the work of Data Colada, Gelman and Ioannidis, much of this was old hat.

Yes, people over-rely on and misinterpret p-values. Yes, people "double-dip" and torture/exhaust their data, hunt for statistically significant results (green jelly beans!) with multiple comparisons, put negative or non-results in the "filing cabinet" and suffer from the "winner's curse" (where randomly large results are more likely to hit the p-value bingo and thus get reported, leading to an upward bias). In fact, EVERYTHING leads to an upward bias in results - as Ioannidis said, most research findings are probably false. Or, at least, not as big and positive as we so believe.

I thought this would have a bit more practical stuff, a bit more Bayes (BAYES), and a bit of a wider scope. The last sections, on the perverse incentive structures of academia (pre-analysis plans that no one really signs up for; journals that reward "winner's cursey" BIG, POSITIVE results, p-hacking), were definitely interesting and got my fist shaking. But I'm not in that world anymore, and so I'm kinda like, "oh well, dudes". I mean, there is a LOT wrong with academia's incentive structures, and, yes, they definitely corrupt the pure Science, but what about practitioners in industry? Oh well.

dead-tree-book hard-sciences

Andrew Chen

20 reviews1 follower

February 24, 2020

great set of examples of common statistical mistakes that can be unintuitive. lots of examples of existing literature that screw some of these up. not gonna lie, makes me rather wary of pretty much all medical research. some key points to remember:

ch2: confidence intervals offer more information than p-values and can be used to compare groups of different sizes. statistical power is very important, and underpowered studies might result in truth inflation. statistically insignificant does not mean zero effect.

ch3: dependence between observations creates issues (1000 patients vs. 100 measurements of 10 patients). better experimental design or methods like hierarchical models might help address these concerns.

ch4: remember meaning of p-value. p-value does not mean probability a given result is true/false--base rate also factors into the equation. multiple comparisons can result in false positives.

ch5: one group being significant and another group being not significant != two groups are significantly different. comparing whether two intervals overlap is not a significance test. comparing many groups requires multiple comparison adjustments.

ch6: using data to decide on analysis means new data should be collected to do the actual analysis. if separating groups based on some significance test, expect natural mean reversion. stopping rules can introduce bias and path dependency in experimentation.

ch7: no need to bucket continuous variables unless there is some intuitive reason. instead, use techniques that are designed for continuous variables. if bucketing, bucket based on reasonable criteria, not to maximize statistical significance.

ch9: try to have the statistical hypothesis in mind before data collection and testing to minimize bias.

Christina Jain

139 reviews5 followers

December 22, 2015

This is your go-to book if you need a breakneck primer on statistics. It only takes a few hours to read and at the end of it you'll be familiar with confidence intervals, standard error, power, catching multiple comparisons, truth inflation, and more! The goal of the book isn't to teach you how to do the calculations but rather to give you a basic understanding of the things statisticians concern themselves with and common misconceptions to watch out for.

Nishant Pappireddi

194 reviews8 followers

July 26, 2015

This was a really interesting book that talked about a lot of common statistical errors in scientific research. There were a lot of funny examples, including an MRI of a dead salmon. However, this book has made me more pessimistic about my ability to do statistically rigorous research.