This book is a wonderful technical text suitable as either a supplement for a more formal text in the subject area, it could easily be mated with a book like “Introduction to Statistical Learning” by Tibshirani et. al, and in a way, that would be a good combo as that book is written in R and this is focused in Python, so it would give the student exposure to both of the current (circa 2020) main languages of data science/machine learning. Real World Machine Learning as a text is very much a “project” centric text, where roughly the first half is a grounded, coding-centric introduction to machine learning, and the second half provides nice mini-projects, and importantly, a walk-thru of those projects within the currently popular PyData API/stacks.
But again, the real jewels for the book are in the second-half, which goes through step-by-step analysis for real world data in a handful of projects. In each step, the student see's visualizations, and the author discusses motivations for the visualizations in the EDA, what certain features are selected, why certain models are selected, how to execute them, and how to test their use. From someone who learned this material before the explosion in data science material the past few years, I can genuinely say I wish I had something like this when I was first learning it, especially as validation that the steps I was making in my own professional projects made sense, and that I wasn't wandering in the woods randomly (something data scientist sometimes do).
Despite this practical view on the material, I really appreciated that the book did not dumb down the apparatus, and went into suffecient mathematical detail when appropriate. The book does not only discuss classifiers, logistic regressions, and trees, but also goes into a high-level mathematical derivation of logistic regressions via a discussion of the log-odds ratio, further, he goes into fairly decent length on the nature of classification and how one may adjudicate the ‘quality’ via the FPR/FNR, as well as the challenge of classification in non-linearly separable data, with some basic suggestions on how to deal with these via kernel-methods.
Going back to the practical view, this book also equips the reader with a very clear and organized template for both the data-quality analysis in the pre-processing and the standard machine learning pipeline for supervised machine learning with the goal of prediction use-cases. Whether conceptually, in code, or diagrammatically, the author hammers-in the process-flow, and introduces all of the appropriate graphs (and their accompanying code-snippet) to the reader so they can understand sequentially/logically what it is they are doing what they are doing. Further, these code-snippets often are complete functions, with code syntactical hygiene and some design with respect to exception handling which incorporates elements of basic programming that will be key to a data scientist’s success.
Overall, I liked this book, another good installment from the Manning series. I think the audiobook is a good asset to consumer as a refresher for more senior scientist who may be able to pick up one or two tricks, or just solidify their understanding on some elements, and is definitely great for introductory students in the subject area. It’s definitely not a total substitute for a deeper education on the subject, but those are very specific-needs, that are moving more towards the ML/AI engineering domains nowadays anyways. Recommended.