Site reliability engineering (SRE) is more relevant than ever. Knowing how to keep systems reliable has become a critical skill. With this practical book, newcomers and old hats alike will explore a broad range of conversations happening in SRE. You'll get actionable advice on several topics, including how to adopt SRE, why SLOs matter, when you need to upgrade your incident response, and how monitoring and observability differ. Editors Jaime Woo and Emil Stolarsky, co-founders of Incident Labs, have collected 97 concise and useful tips from across the industry, including trusted best practices and new approaches to knotty problems. You'll grow and refine your SRE skills through sound advice and thought-provoking questions that drive the direction of the field. Some of the 97 things you should
"97 Things Every SRE Should Know: Collective Wisdom from the Experts" is a compilation of 97 articles categorized for SREs. Given the book's fragmented structure, I found the non-technical aspects more engaging. This approach helps the book remain relevant despite technological changes over time.
One chapter on burnout was particularly eye-opening, helping me understand the issue better and even recognize signs of burnout in myself (thanks to a recommended website).