How Software A Field Guide to Understanding Complex System Disasters Why Systems Break in Ways Their Creators Never Imagined. Software failures aren't accidents, they're inevitabilities. In a universe governed by probability rather than certainty, even cosmic rays from distant stars can flip bits in computer memory, causing election machines to miscount votes or video game characters to jump impossibly high. But cosmic interference is just the beginning of how our most critical systems fail in spectacular and unpredictable ways.
Through gripping real-world case studies, this field guide reveals the hidden laws governing complex system disasters. Discover how Knight Capital lost $460 million in 45 minutes due to a single misplaced software flag. Learn why the Therac-25 radiation machine killed patients despite passing every safety test.Understand how a 40-kilobyte configuration file crashed 8.5 million computers worldwide, grounding flights and shuttering hospitals across the globe.
What You'll Learn Based on complexity theorist Richard Cook's groundbreaking principles, you'll testing can never guarantee perfection, and what to do insteadHow "reasonable" decisions combine to create unreasonable disastersWhy complex systems always run in degraded mode, and why that's actually normalHow scale transforms rare impossibilities into daily certaintiesWhy the search for "root causes" consistently leads us astray From Understanding to Action But this isn't just about understanding failure, it's about building resilience. Explore practical strategies from organizations that have learned to thrive in Mars rovers that adapt and learn from component failures, operating decades beyond their planned lifetimesThe internet's routing protocols that automatically heal themselves when damagedNetflix's chaos engineering that deliberately breaks their own servers to build antifragile systemsThe ethical frameworks for deciding what level of failure is acceptable when lives are at stake Who This Book Is For Whether you're a software engineer debugging production issues, a manager trying to prevent the next catastrophic outage, or simply curious about why technology fails in impossible ways, this book will forever change how you think about the complex systems that run our world.