Don’t fly blind. Observability gives you actionable insights into your cloud native systems—from pinpointing errors, to increasing developer productivity, to tracking compliance.
Observability is the difference between an error message and an error explanation with a recipe how to resolve the error! You know exactly which service is affected, who’s responsible for its repair, and even how it can be optimized in the future. Cloud Observability in Action teaches you how to set up an observability system that learns from a cloud application’s signals, logging, and monitoring, all using free and open source tools.
In Cloud Observability in Action you will learn how
Apply observability in cloud native systems Understand observability signals, including their costs and benefits Apply good practices around instrumentation and signal collection Deliver dashboarding, alerting, and SLOs/SLIs at scale Choose the correct signal types for given roles or tasks Pick the right observability tool for any given function Communicate the benefits of observability to management A well-designed observability system provides insight into bugs and performance issues in cloud native applications. They help your development team understand the impact of code changes, measure optimizations, and track user experience. Best of all, observability can even automate your error handling so that machine users apply their own fixes—no more 3AM calls for emergency outages.
About the technology
Cloud native systems are made up of hundreds of moving parts. When something goes wrong, it’s not enough to know there is a problem—you need to know where it is, what it is, and how to fix it. This book takes you beyond traditional monitoring, explaining observability systems that turn application telemetry into actionable insights.
About the book
Cloud Observability in Action gives you the background and techniques you need to successfully introduce observability into cloud-based serverless and Kubernetes environments. In it, you’ll learn to use open standards and tools like OpenTelemetry, Prometheus, and Grafana to build your own observability system and end reliance on proprietary software. You’ll discover insights from different telemetry signals, including logs, metrics, traces, and profiles. Plus, the book’s rigorous cost-benefit analysis ensures you’re getting a real return on your observability investment.
What's inside
Observability in and of cloud native systems Dashboarding, alerting, and SLOs/SLIs at scale Signal types for any role or task State-of-the-art open source observability tools About the reader
For application developers, platform owners, DevOps, and SREs.
About the author
Michael Hausenblas is a Product Owner in the AWS open source observability team.
Cloud Observability In Action has been an easygoing and enjoyable read. Tech books can sometimes get a bit heavy going or dry, not the case here. Firstly, Michael went back to first principles, making the difference between Observability and monitoring - something that often gets muddied (and I've been guilty of this, as the latter is a subset of the former). Observability doesn't roll off the tongue as smoothly as monitoring (although I rather like the trend of using O11y). This distinction, while helpful, particularly if you're still finding your feet in this space, is good. What is more important is stepping back and asking what should we be observing and why we need to observe it. Plus, one of my pet points when presenting on the subject - we all have different observability needs - as a developer, an ops person, security, or auditors.
Next is Michael's interesting take on how much O11y code is enough. Historically, I've taken the perspective - that enough is a factor of code complexity. More complex code - warrants more O11y or logging as this is where bugs are most likely to manifest themselves; secondly, I've looked at transaction and service boundaries. The problem is this approach can sometimes generate chatty code. I've certainly had to deal with chatty apps, and had to filter out the wheat from the chaff. So Michael's approach of cost/benefit and measuring this using his B2I ratio (how much code is addressing the business problems over how much is instrumentation) was a really fresh perspective and presented in a very practical manner, with warnings about using such a measure too rigidly. It's a really good perspective as well if you're working on hyperscaling solutions where a couple of percentage point improvements can save tens of thousands of dollars. Pretty good going, and we're only a couple of chapters into the book.
The book gets into the underlying ideas and concepts that inform OpenTelemetry, such as traces and spans, metrics, and how these relate to Observability. Some of the classic mistakes are called out, such as dimensioning metrics with high cardinality and why this will present real headaches for you.
As the data is understood, particularly metrics you can start to think about how to identify what normal is, what is abnormal, or an outlier. That then leads to developing Service Level Objectives (SLOs), such as an acceptable level of latency in the solution or how many errors can be tolerated.
The book isn't all theory. The ideas are illustrated with small Go applications, which are instrumented, and the generated metrics, traces, and logs. Rather than using a technology such as Fluentd or Fluent Bit, Michael starts by keeping things simple and directly connecting the gathering of the metrics into tools such as Prometheus, Zipkin, Jaeger, and so on. In later chapters, the complexity of agents, aggregators, and collectors is addressed. Then, the choices and considerations for different backend solutions from cloud vendor-provided services such as OpenSearch, ElasticSearch, Splunk, Instana and so on. Then, the front-end visualization of the data is explored with tools such as Grafana, Kibana, cloud-provided tools, and so on.
As the book progresses, the chapters drill down into more detail, such as the differences and approaches for measuring containerized solutions vs. serverless implementations such as Lambda and the kinds of measures you may want. The book isn't tied to technologies typically associated with modern Cloud Native solutions, but more traditional things like relational databases are taken into account.
The closing chapters address questions such as how to address alerting, incident management, and implementing SLOs. How to use these techniques and tools can help inform the development processes, not just production.
So I would recommend the book, if you're trying to understand Observability (regardless of a cloud solution or not). If you're trying to advance from the more traditional logging to a fuller capability, then this book is a great guide, showing what, why, and how to evaluate the value of doing so.
The content is very disappointing to me. The book touches on most, if not all, areas of observability in today's cloud landscape, providing a broad overview of the industry and the different technologies specific to each area. But that's where it ends.
I was expecting some in-depth knowledge from an experienced engineer, so every time the basic introduction ended in each chapter and I thought the substantial content was about to come, it never did. Most of the time, the author just briefly introduced some buzzwords without providing concrete examples, and often references other books or blogs with statements like, "if you are interested in details on this topic, I recommend reading ...". As a result, I found that what I've learned from this book is even less than what is available in the "Concept" section of the OpenTelemetry official documentation.
In addition to the shallow knowledge, I found it difficult to follow everything the author wanted to cover. Some important definitions are not thoroughly explained before being used throughout the rest of the book. For example, instrumentation is a critical concept in observability, but the author never explained what it means before using it everywhere. And in Chap 10 the user explained the difference among SLI, SLO and SLA. But when it comes to SLO, before explaining SLA, the author is saying "(SLO) in contast to SLAs..." so I think it would be hard to understand it for people without prior knowledge of SLA. And I suggest adding more architectural diagrams alongside the practical examples to help readers understand how everything connects.
Overall, my impression of this book is that it resembles a poorly written blog without much valuable insight, and I would not recommend it to engineers of any level.
The author makes a good balance between presenting fundamental concepts in detail and offering practical advice and examples. The book will be of most use for beginners and intermediate readers that want to develop their understanding of observability. More advanced readers still will find it useful as the author makes a great job synthesizing important concepts.
The main reason I'm giving it 4 starts is because the content feels somehow disconnected from chapter to chapter. I'm missing an introduction with the big-picture of the observability infrastructure that is developed along the book. Also, I find a little bit disconcerting finding some random screen capture of cloud vendor observability solutions (mostly AWS) that are not really explained in detail.