Rate this book

The 9 Pitfalls of Data Science

Name: The 9 Pitfalls of Data Science
Rating: 3.93 (4 reviews)
ISBN: 9780198844396

Gary Smith, Jay Cordes

Rate this book

Data science has never had more influence on the world. Large companies are now seeing the benefit of employing data scientists to interpret the vast amounts of data that now exists. However, the field is so new and is evolving so rapidly that the analysis produced can be haphazard at best.

The 9 Pitfalls of Data Science shows us real-world examples of what can go wrong. Written to be an entertaining read, this invaluable guide investigates the all too common mistakes of data scientists - who can be plagued by lazy thinking, whims, hunches, and prejudices - and indicates how they have
been at the root of many disasters, including the Great Recession.

Gary Smith and Jay Cordes emphasise how scientific rigor and critical thinking skills are indispensable in this age of Big Data, as machines often find meaningless patterns that can lead to dangerous false conclusions. The 9 Pitfalls of Data Science is loaded with entertaining tales of both
successful and misguided approaches to interpreting data, both grand successes and epic failures. These cautionary tales will not only help data scientists be more effective, but also help the public distinguish between good and bad data science.

GenresNonfiction

272 pages, Hardcover

First published April 1, 2019

11 people are currently reading

43 people want to read

About the author

Gary Smith

384 books45 followers

There is more than one author with this name

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

6 (40%)

4 stars

4 (26%)

3 stars

3 (20%)

2 stars

2 (13%)

1 star

0 (0%)

Displaying 1 - 4 of 4 reviews

Ilan Siegel

63 reviews

February 17, 2021

Only read this if you work in a data science adjacent field. Communications version of technical concepts in data science. A few useful tidbits of information are hammered home via pithy stories, however, the fake names and simplicity of the concepts conveyed in the stories is slightly boring. It is a super quick read.

Some interesting takeaways for A/B testing and data analytics:
-- Regression toward the mean (things that perform better than the mean or worse than the mean will probably regress closer to that point in the future)
-- Having a test with more than 6 variants has an 80% chance of having a random statistically significant variant
-- Only 8% of tests run by scientists with hypothesis' written out before the test show significant results. The random odds a test shows significant results is 6%.

Stijn

97 reviews

April 3, 2021

Not very deep -- 9 pitfalls summarized as "have the theory before trying to find proof in the data otherwise you'll find nonsense patterns". Does not address what happens if your theory is nonsense to start with. Rightly criticizes deep learning for being a black box, not indicating whys. Does not convince me that "proper" data science does not suffer the same ailment.

Logic-based reasoning stands lonely and unrealized as the only viable alternative.