Jump to ratings and reviews
Rate this book

Reliable Machine Learning: Applying SRE Principles to ML in Production

Rate this book
Whether you're part of a small startup or a multinational corporation, this practical book shows data scientists, software and site reliability engineers, product managers, and business owners how to run and establish ML reliably, effectively, and accountably within your organization. You'll gain insight into everything from how to do model monitoring in production to how to run a well-tuned model development team in a product organization.


By applying an SRE mindset to machine learning, authors and engineering professionals Cathy Chen, Kranti Parisa, Niall Richard Murphy, D. Sculley, Todd Underwood, and featured guest authors show you how to run an efficient and reliable ML system. Whether you want to increase revenue, optimize decision making, solve problems, or understand and influence customer behavior, you'll learn how to perform day-to-day ML tasks while keeping the bigger picture in mind.


You'll

What ML how it functions and what it relies onConceptual frameworks for understanding how ML "loops" workHow effective productionization can make your ML systems easily monitorable, deployable, and operableWhy ML systems make production troubleshooting more difficult, and how to compensate accordinglyHow ML, product, and production teams can communicate effectively

678 pages, Kindle Edition

Published October 12, 2021

28 people are currently reading
159 people want to read

About the author

Cathy Chen

1 book

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
5 (15%)
4 stars
16 (48%)
3 stars
7 (21%)
2 stars
4 (12%)
1 star
1 (3%)
Displaying 1 - 2 of 2 reviews
Profile Image for Thang.
101 reviews13 followers
November 20, 2022
This book is for organization managers rather than data scientists or ML engineers. It covers the basic understanding of ML development, continuous ML requirements, and incident handling.
Some contents are repetitive and do not have much new information.
Profile Image for yacoob.
248 reviews7 followers
September 2, 2025
Excellent overview of challenges specific to systems employing machine learning algorithms. There's a bit of overlap between chapters, as a result of different authors contributing them - that's the only reason I've not given it 5/5.
Displaying 1 - 2 of 2 reviews

Can't find what you're looking for?

Get help and learn more about the design.