Jump to ratings and reviews
Rate this book

The 9 Pitfalls of Data Science

Rate this book
Data science has never had more influence on the world. Large companies are now seeing the benefit of employing data scientists to interpret the vast amounts of data that now exists. However, the field is so new and is evolving so rapidly that the analysis produced can be haphazard at best.

The 9 Pitfalls of Data Science shows us real-world examples of what can go wrong. Written to be an entertaining read, this invaluable guide investigates the all too common mistakes of data scientists - who can be plagued by lazy thinking, whims, hunches, and prejudices - and indicates how they have been at the root of many disasters, including the Great Recession.

Gary Smith and Jay Cordes emphasise how scientific rigor and critical thinking skills are indispensable in this age of Big Data, as machines often find meaningless patterns that can lead to dangerous false conclusions. The 9 Pitfalls of Data Science is loaded with entertaining tales of both successful and misguided approaches to interpreting data, both grand successes and epic failures. These cautionary tales will not only help data scientists be more effective, but also help the public
distinguish between good and bad data science.

262 pages, Kindle Edition

First published April 1, 2019

11 people are currently reading
43 people want to read

About the author

Gary Smith

384 books45 followers
There is more than one author with this name

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
6 (40%)
4 stars
4 (26%)
3 stars
3 (20%)
2 stars
2 (13%)
1 star
0 (0%)
Displaying 1 - 4 of 4 reviews
Profile Image for Ilan Siegel.
63 reviews
February 17, 2021
Only read this if you work in a data science adjacent field. Communications version of technical concepts in data science. A few useful tidbits of information are hammered home via pithy stories, however, the fake names and simplicity of the concepts conveyed in the stories is slightly boring. It is a super quick read.

Some interesting takeaways for A/B testing and data analytics:
-- Regression toward the mean (things that perform better than the mean or worse than the mean will probably regress closer to that point in the future)
-- Having a test with more than 6 variants has an 80% chance of having a random statistically significant variant
-- Only 8% of tests run by scientists with hypothesis' written out before the test show significant results. The random odds a test shows significant results is 6%.
Profile Image for Stijn.
97 reviews
April 3, 2021
Not very deep -- 9 pitfalls summarized as "have the theory before trying to find proof in the data otherwise you'll find nonsense patterns". Does not address what happens if your theory is nonsense to start with. Rightly criticizes deep learning for being a black box, not indicating whys. Does not convince me that "proper" data science does not suffer the same ailment.

Logic-based reasoning stands lonely and unrealized as the only viable alternative.
Profile Image for James.
85 reviews3 followers
September 24, 2020
It uses too many random number exercises that confuse the issue.
Displaying 1 - 4 of 4 reviews

Can't find what you're looking for?

Get help and learn more about the design.