Data science has never had more influence on the world. Large companies are now seeing the benefit of employing data scientists to interpret the vast amounts of data that now exists. However, the field is so new and is evolving so rapidly that the analysis produced can be haphazard at best.
The 9 Pitfalls of Data Science shows us real-world examples of what can go wrong. Written to be an entertaining read, this invaluable guide investigates the all too common mistakes of data scientists - who can be plagued by lazy thinking, whims, hunches, and prejudices - and indicates how they have been at the root of many disasters, including the Great Recession.
Gary Smith and Jay Cordes emphasise how scientific rigor and critical thinking skills are indispensable in this age of Big Data, as machines often find meaningless patterns that can lead to dangerous false conclusions. The 9 Pitfalls of Data Science is loaded with entertaining tales of both successful and misguided approaches to interpreting data, both grand successes and epic failures. These cautionary tales will not only help data scientists be more effective, but also help the public distinguish between good and bad data science.
Only read this if you work in a data science adjacent field. Communications version of technical concepts in data science. A few useful tidbits of information are hammered home via pithy stories, however, the fake names and simplicity of the concepts conveyed in the stories is slightly boring. It is a super quick read.
Some interesting takeaways for A/B testing and data analytics: -- Regression toward the mean (things that perform better than the mean or worse than the mean will probably regress closer to that point in the future) -- Having a test with more than 6 variants has an 80% chance of having a random statistically significant variant -- Only 8% of tests run by scientists with hypothesis' written out before the test show significant results. The random odds a test shows significant results is 6%.
Not very deep -- 9 pitfalls summarized as "have the theory before trying to find proof in the data otherwise you'll find nonsense patterns". Does not address what happens if your theory is nonsense to start with. Rightly criticizes deep learning for being a black box, not indicating whys. Does not convince me that "proper" data science does not suffer the same ailment.
Logic-based reasoning stands lonely and unrealized as the only viable alternative.