Resilience engineering has since 2004 attracted widespread interest from industry as well as academia. Practitioners from various fields, such as aviation and air traffic management, patient safety, off-shore exploration and production, have quickly realised the potential of resilience engineering and have became early adopters. The continued development of resilience engineering has focused on four abilities that are essential for resilience. These are the ability a) to respond to what happens, b) to monitor critical developments, c) to anticipate future threats and opportunities, and d) to learn from past experience - successes as well as failures. Working with the four abilities provides a structured way of analysing problems and issues, as well as of proposing practical solutions (concepts, tools, and methods). This book is divided into four main sections which describe issues relating to each of the four abilities. The chapters in each section emphasise practical ways of engineering resilience and feature case studies and real applications. The text is written to be easily accessible for readers who are more interested in solutions than in research, but will also be of interest to the latter group.
Getting a handle, on recent developments, in software development, or devops/secdevops, has been a project of mine recently- when it comes to elasticity, infrastructure as code, and micro-services- there is a fair deal I do not know. The project, of re-reading some older OOAD materials, and this has been an eye-opener; a bit of fresh perspective on some old materials, like the GOF Pattern's book, which I have been going through gradually, has led to a lot of fresh insight, since I last seriously read it, during my undergraduate college days.
A collection of contributed essays on Resilience Engineering, discussing the practical application of the discipline in production settings.
The book frames Resilience Engineering as "the ability to respond to events, to monitor ongoing developments, to anticipate future threats and opportunities, and to learn from past failures and successes alike" in order to achieve safety goals, in particular how to manage socio-technical systems to these ends. The work emphasizes that "it is both easier and more effective to increase safety by improving the number of things that go right, than by reducing the number of things that go wrong", and in general that "‘things that go wrong’ [are] the flip side of the ‘things that go right,’ and ... [the] result of the same underlying processes.", which needs to be studied in full to improve both safety and production goals.
For each of the four main aspects of Resilience Engineering - monitoring, anticipating, responding, and learning - the book includes several essays discussing recent research. This research draws on various fields with a focus on safety and performance, including aviation, medicine, emergency response, and energy, as well as general theoretical work in Resilience Engineering and related disciplines. The collection is well edited and structured, and includes helpful framing within each section and in the book's overall introduction.
In terms of applicability to software development, the book uses a socio-technical systems management lens and is therefore most appropriate for engineering managers, project managers, and those working specifically on resilience and safety, but less directly relevant for full-time software developers.
Contains some good summaries of resilience engineering concepts, but I was hoping it would stir my mental pot a bit more. Once I'd picked up the basic concepts, most of the chapters just felt like rehashes of those concepts.