Most companies work hard to avoid costly failures, but in complex systems a better approach is to embrace and learn from them. Through chaos engineering, you can proactively hunt for evidence of system weaknesses before they trigger a crisis. This practical book shows software developers and system administrators how to plan and run successful chaos engineering experiments. System weaknesses go beyond your infrastructure, platforms, and applications to include policies, practices, playbooks, and people. Author Russ Miles explains why, when, and how to test systems, processes, and team responses using simulated failures on Game Days. You’ll also learn how to work toward continuous chaos through automation with features you can share across your team and organization.
Great book to get started with Chaos Engineering. Russ makes it very live and simple and introduces us with help of open source tools. Interesting fact is that this is based on hypothesis based approaches. During this time I was discussing with Amazon for a role in a team for Customer Reliability. The significance of the subject is increasing as Software moves closer to reliability as a major consideration.