In today's "always-on" world, downtime is not an option. Your users expect seamless service, 24/7. Your business depends on it. But how do you guarantee that reliability when complex systems inevitably encounter turbulence? The answer lies in a world-class on-call capability.
"On-Call In Action" is your practical playbook for building just that. This isn't just another theoretical tome; it's a hands-on guide to navigating the high-stakes reality of modern on-call. We'll equip you with the SRE principles, incident management lifecycles, and effective alerting strategies (leveraging the Versus Incident project as our real-world example) that form the backbone of resilient operations.
This book, "On-Call In Action," is your friendly guide to making on-call work better. We'll show
Why being on-call is so important.What to do when a problem (we call it an "incident") happens.How to set up good alerts so you only get called for big problems. We'll even show you how with a free tool called "Versus Incident."How to check if your services are running well (using simple goals).How to learn from mistakes without blaming anyone, so things get better.How to make good on-call schedules so people don't get too tired.How to create a supportive team for on-call work. Stop just reacting to problems and start engineering reliability. Whether you're a tech person who is on-call, a manager, or just curious, this book will give you clear advice and real examples. We want to help you build an on-call system that keeps your services running and your team feeling good.
This book contains 11 1 Why On-Call Matters & SRE PrinciplesChapter 2 Anatomy of an The Management LifecycleChapter 3 Effective Strategy and Routing Use Versus IncidentChapter 4: Integrating Monitoring Sources and Escalation A Case StudyChapter 5: Measuring SLIs, SLOs, and Error BudgetsChapter 6: Putting It All Practical Examples of Unified Alerting & TemplatingChapter 7: Learning from Blameless PostmortemsChapter 8: Sustainable Scheduling and Managing BurnoutChapter 9: Effective IncidentChapter 10: The On-Call Tooling and Future TrendsChapter 11: On-Call in Digital Customer Onboarding in Banking