The most comprehensive book on the engineering aspects of building reliable AI systems. "If you intend to use machine learning to solve business problems at scale, I'm delighted you got your hands on this book." -Cassie Kozyrkov, Chief Decision Scientist at Google "Foundational work about the reality of building machine learning models in production." -Karolis Urbonas, Head of Machine Learning and Science at Amazon
Anyone who has built a few production level ML applications, have realised that it is more than just fitting models in Jupyter Notebooks. Andriy carefully takes the reader through the process, from deciding if it the problem should be solved with ML, through data collection, feature engineering, model building, evaluation, deploying to serving and maintaining ML models.
No other ML book I've come across talk about the engineering problems of ML to the degree this book does.
In combination with his previous book, The Hundred-page Machine Learning book, Andriy has created very thorough literature on the full lifecycle of ML projects, with minimal overlap between the books.
Highly recommended for anyone trying to put data science into production.
I started this book right after the author's 100-page ML book. Though this is from 2020, many concepts are still relevant for building ML pipelines. A good read if you are new to engineering ML projects.
You can use this book as a checklist for a Machine Learning implementation. It goes deep into each step of the well-known lifecycle and I liked that approach because it gives you useful tricks in each step and a full explanation. However, sometimes you can feel a little bit repetitive but it is the minority of the time. The majority of the time you will have Aha! moments in each chapter. I liked more the previous book of this author, to be honest.
The best machine learning engineering book I have read
If you want a deeper understanding of the emerging fields of machine learning engineering and MLOps, this is the book to start with. The author covers the topics very well, from end to end. Highly recommend!
I needed this book so badly. It explains the whole ML lifecycle pretty well. It assembles the best practices to create a production-ready ML applications. Highly recommended!
Mam wrażenie, że ta książka to potrzebny produkt na rynku, wypełniający pewną niszę w wiedzy dziedzinowej dotyczącej uczenia maszynowego. Całkiem jasno opisuje ona, jak zawiłe są projekty z tej branży, a każdy ich etap jest nakreślony i poświęcone jest mu trochę miejsca w tekście - ale to wciąż ogólny obraz bardziej z lotu ptaka, niż bardzo mięsiste i szczegółowe rozdziały. Na szczęście, książka również podsuwa ciekawe pomysły, przemyślenia wynikające z doświadczenia Autora. Mam też wrażenie, że jest trochę niezbalansowana - bardziej skręca w stronę codziennej pracy osoby modelującej, po macoszemu traktując tematy podejmowane przez np. inżyniera do spraw MLOps. Kodu jest też jak na lekarstwo, w odróżnieniu od wzorów matematycznych. Właśnie za ten brak balansu odejmuję jedną gwiazdkę. Dodatkowo, styl pisania czasem mnie trochę jednak nudził, a trochę podrozdziałów przeczytałem przez to dosyć pobieżnie - drugi minus. Ogólnie można książkę przeczytać, ale raczej z nastawieniem, że doktoryzować się z każdego z rozdziałów trzeba samemu.
Great overview of how to run real world machine learning projects. Virtually no time is spent on algorithms, and as the author puts it: “the greatest challenges must be solved before you type from sklearn.linear_model import LogisticRegression” and “rest of the problem is solved after you type model.fit(X,y)”. This book covers what many other ML books don’t. That said, there weren’t that many concepts that were new to me but it was great to see them all collected in one condensed volume. Only thing preventing me from giving the fifth star is the lack of practical examples and advice. Would specifically have liked to see concrete examples on version control strategies and how to structure code and data.
This entire review has been hidden because of spoilers.
Although the author offers this as a read-first-pay-if-you-enjoy book, which I really appreciate, the book seemed to high level to add a lot of value. I think I finished it exceptionally quick since I could scan and skip 60 percent. The book might be a good introduction for somebody that has zero experience in both MLOPS as well as basic data science and gives you a taste of what comes around the corner when implementing ML in production but it rarely goes in deep enough to provision you with the knowledge you need to get going. It might be a good book to get some ideas and then start googling on specific subjects.
It covers many important concepts relevant to data science, but it does not explore any of them in depth. I picked up this book to begin my training in model deployment, and while it briefly mentions different deployment patterns and best practices, it suffers from the same issue as the rest of the text: it presents these concepts superficially, without providing any concrete examples. The only lines of code included in this section are used to illustrate Big O notation in the context of computational complexity and a simplistic example of "deploying" a model using joblib.
Good overview of ML Engineering from the writer of The Hundred-Page Machine Learning Book. Lacks mathematical sophistication for much of the descriptions and at times goes too simple (explaining the concept of gradients and partial derivatives, for example!). But overall, does talk about the practical realities of designing, testing, and implementing machine learning systems in the enterprise environment.
Excellent book - covers a broad range of topics at the level you would expect from Andriy (enough to gain a conceptual understanding, without too much focus on language-specific implementations). Its an easy (and highly-recommended) read for intermediate and experienced ML practitioners alike.
This book is like going back several decades trying to reconstruct the idea of how to make a few tools (in our context ML algorithms) deliver value and profit in a world which has not seen process engineering, operations research, and six sigma.
The only unique complexity in a data product is privacy which surprisingly is almost absent in the book. Being endorsed from people from Google infact we should have expected privacy and ethics to have been the central theme.
At best this is a massive checklist of known things to consider across a rudimentary lifecycle, or just an amalgamation of disjointed ideas trying to act as remedy of individual ML project failures.
The predictable science of business success using ML, is far more easier to comprehend and build. Treating it with isolated, symptomatic learnings from several failures using checklist styling is a far complex and confusing approach.