Jump to ratings and reviews
Rate this book

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure

Rate this book
Modern Internet services are often implemented as complex, large-scale distributed systems. These applications are constructed from collections of software modules that may be developed by different teams, perhaps in different programming languages, and could span many thousands of machines across multiple physical facilities. Tools that aid in understanding system behavior and reasoning about performance issues are invaluable in such an environment.
Here we introduce the design of Dapper, Google’s production distributed systems tracing infrastructure, and describe how our design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met. Dapper shares conceptual similarities with other tracing systems, particularly Magpie and X-Trace, but certain design choices were made that have been key to its success in our environment, such as the use of sampling and restricting the instrumentation to a rather small number of common libraries.
The main goal of this paper is to report on our experience building, deploying and using the system for over two years, since Dapper’s foremost measure of success has been its usefulness to developer and operations teams. Dapper began as a self-contained tracing tool but evolved into a monitoring platform which has enabled the creation of many different tools, some of which were not anticipated by its designers. We describe a few of the analysis tools that have been built using Dapper, share statistics about its usage within Google, present some example use cases, and discuss lessons learned so far.

Unknown Binding

5 people want to read

About the author

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
2 (50%)
4 stars
0 (0%)
3 stars
2 (50%)
2 stars
0 (0%)
1 star
0 (0%)
Displaying 1 - 2 of 2 reviews
Profile Image for Rob Sanek.
145 reviews29 followers
May 1, 2019
Solid paper. Explains the architecture clearly and goes into some cool use cases. However, for many usage examples the explanation is far too shallow, which limits learnings readers can take away.
15 reviews
March 27, 2025
Good paper. A good way to get beyond the absolute basics of monitoring to a more considerate place.

e.g., How can tracing be implemented without large performance overhead? How can we minimize the amount that application developers need to think about monitoring? In what unexpected ways can great tracing be leveraged?

Even if I don't get the monitoring SWE job I'm interviewing for, this is a worthwhile read that will help me be a better engineer in any distributed systems context.

EDIT: highest roi reading of my life
Displaying 1 - 2 of 2 reviews

Can't find what you're looking for?

Get help and learn more about the design.