Jump to ratings and reviews
Rate this book

I Heart Logs: Event Data, Stream Processing, and Data Integration

Rate this book
Why a book about logs? That’s the humble log is an abstraction that lies at the heart of many systems, from NoSQL databases to cryptocurrencies. Even though most engineers don’t think much about them, this short book shows you why logs are worthy of your attention. Based on his popular blog posts, LinkedIn principal engineer Jay Kreps shows you how logs work in distributed systems, and then delivers practical applications of these concepts in a variety of common uses―data integration, enterprise architecture, real-time stream processing, data system design, and abstract computing models. Go ahead and take the plunge with logs; you’re going love them.

57 pages, Paperback

First published September 23, 2014

44 people are currently reading
572 people want to read

About the author

Jay Kreps

1 book10 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
93 (23%)
4 stars
172 (44%)
3 stars
102 (26%)
2 stars
17 (4%)
1 star
5 (1%)
Displaying 1 - 30 of 45 reviews
Profile Image for Eugene Yokota.
14 reviews18 followers
March 28, 2015
I bought and read the PDF edition after recommendation by a colleague before realizing there's The Log: What every software engineer should know about real-time data's unifying abstraction (the blog article this book is based on?). This is a fifty-nine-page PDF file. You can almost say it's a detailed blog article with a cover.

The title of this book is probably misleading too. The log that is discussed here is an abstract data structure, similar to transaction logs (or journal). I felt like he's reviving the concept of Enterprise Service Bus along with the notion of structured logging that' already becoming popular recently. If someone is expecting to read about the best practice on log4j style text logging, or about their implementation, they'll be confused and/or get angry. You have to be ready to read this book.

Despited all the above issues, for those who are interested in learning about scalable distributed systems, this book is worthwhile. Having paid the full price, I've given it more focus than a typical blog post, and words weigh more knowing the author is responsible for Apache Kafka, which came out of LinkedIn. It's sprinkled with gems of insights and advices backed by the author's experience. For instance, he discusses the notion of organization scalability in terms of logging. It also acts as a general introduction to log-centric design, and specifically the the thinking behind Kafka. Overall, I liked it because I learned new things, but it's not something I'd recommend for everyone.
Profile Image for Kaushal.
16 reviews2 followers
September 9, 2020
A very difficult read :-0 for 70 odd pages but a very introspective introduction to logs - a misnomer for most of software engineers. A must read to develop a good grasp on distributed data storage systems. One of those books if I read again will say ‘Oh that’s what it meant 🙂.
Profile Image for Siri P R.
3 reviews1 follower
October 22, 2023
The title might be misleading to most developers. The book talks about the“log” as an abstract data structure, underlying to Kafka, database WALs, etc.
It’s a quick short read to understand distributed stream processing systems like Kafka from first principles. The book also has insights into leveraging logs to make sound distributed systems design choices.
Profile Image for Zoltán Tóth.
4 reviews2 followers
March 2, 2015
Short, concise, practical. A great read if you wonder why streaming such a big hit and how to think about logging in general. It's pretty intermediate though, so you will need some expertise to understand the message.
Profile Image for Vicki.
531 reviews242 followers
July 27, 2019
The original blog post is good and makes sense in context (although one could argue that it should really be two separate posts: an architecture of logs and a second post on LinkedIn's architecture.)

This book doesn't add anything, and in fact probably only takes away, from the blog post. Bad cash grab from OReilly, but it's interesting to read now that it's more of a historical document rather than a current as-is state.
Profile Image for Subhadeep Bhattacharyya.
6 reviews1 follower
January 3, 2020
The book is a consolidated information of how logs can be used to sync and operate highly distributed systems. It gives an overview into how this principle was utilised in Linkedin to build Kafka. With plethora of data systems coming up for different purposes, it becomes an increasing difficult problem to integrate all of them together. What becomes more important is to decouple the consumer and the producer where each of them do not need to know about the other. Kafka utilises this principle of separating concerns by building Partitions and then replicas of the same.
While this book does not delve into deeper details of workings of Kafka, it does discuss business issues surrounding the current data based technological challenges and how they can solved.
It gives a brief info into concepts like ETL, OLAP, Batch Processing vs Stream Processing, Lambda architecture, Log Compaction, etc.
Overall a good comprehensive read if you would want to know about the history of Kafka and how logs can be used to solve issues of abundance, scaling and over-complexity with data focused distributed systems.
This entire review has been hidden because of spoilers.
40 reviews2 followers
October 18, 2021
A great intermediate/advanced read on log-based strategy to solving data integration problems when dealing with large data volumes and complex systems within an organization. Highly recommended for intermediate/advanced readers who may be experienced in data engineering / data storage or aspiring for the same.

This is one of the rare technical books that I've read cover-to-cover! It did help that this was pretty short, but that was achieved without sacrificing effectiveness in explaining the ideas or discussing various scenarios where these ideas can be applied.

It does help that the author has himself created several successful, open-source, distributed data storage systems. This experience reflects in this book - the theory is balanced with pragmatism, conceptual purity with practical value to an organization.
Profile Image for Ray.
267 reviews
February 12, 2020
As a young software engineer I find this book to be excellent.

It's general enough to be understandable yet detailed enough to be very interesting.

I finally understand what all the fuss about pub/sub is.


Notes:
* Physical and Logical logging are very different. Physical is basically logging outputs. Logical is basically logging changes (how you changed things).
* "event data records things that happen rather than things that are."
* centralized logging (aka pubsub) is powerful way to avoid many transformations
* You usually want the data publisher to do data cleanup when using centralized logging
Profile Image for Olha.
96 reviews10 followers
Read
December 10, 2021
This book is a gentle introduction to event streaming architectures.
It contains a lot of buzzwords e.g. Kafka of which I heard the second time.

When reading this book a very cool thing happened to me - I've understood what's the purpose of Reactive programming. Rx is used everywhere, especially in mobile; as most apps don't scale, I thought Rx is just a new shiny framework. Author told the story of Linkedin system design, and one moment I've understood what's scalability and how react principles are used to ensure it.

I read this book at learning.oreilly.com
Profile Image for Peter Rybarczyk.
95 reviews10 followers
May 21, 2020
Really I'd like to give it more than 3 stars. But recently I've read "Kafka: The Definitive Guide" and "Making Sense of Stream Processing" both much more useful.

What I really liked is how the author was able to squeeze knowledge about log-centric distributed systems in such a short book. I've been able to read it all in one afternoon and get the entire idea behind it.

What I didn't like is lack of more 'real-life' examples and fact this is almost the same as his a bit older blog post.
16 reviews2 followers
August 20, 2019
An explanation about how to use logs and data Flow for building distributed Systems. The examples from the book are about linkedin Systems and are based on Kafka. For me it was very helpful, but needs a lot of before preparation about distrinuted algorithms like Paxos, raft, Zab, etc or infrastructure components like Kafka, zookeeper.
Profile Image for Vishwaraj.
23 reviews
November 25, 2019
Author has given a nice compilation of batch processing, real time data stream analysis and log analysis together as one basic concept of a virtual entity processing called 'The Log'. This book inspired me to write, it's crisp, to the point, not detailed with implementations. I really liked this book.
Profile Image for Tiago.
89 reviews10 followers
October 13, 2020
I read this book looking for details regarding general logging best practices to help product development. Book focus is different, albeit still interesting, discusses an architecture for storing and analyzing logs based on the author experience working for LinkedIn. It does a good job showing what can be done, and the rational of the architecture choices - but does not show how it's done.
29 reviews1 follower
December 25, 2024
Title could be more 'The power of logs as a data structure' or 'The birth of Kafka'. Very interesting read and I would recommend it. I already knew it wasn't about logs as in the regular way for detecting errors etc. but I can imagine most won't see it like that and from the reviews I also see most had a different expectation.
5 reviews1 follower
January 30, 2020
Stimulates your curiosity about building efficient distributed systems and database internals in general.

I recommend reading this book for anyone planning to start using Kafka. You get an introduction into the thought process and challenges that led to the creation of Kafka
Profile Image for Michael.
90 reviews3 followers
October 11, 2020
What is a log? How to streamline your data (log) collection across systems in your organisation? This book is a short introduction - under 60 pages - to log collection and an overview of the key concepts of a data-centric IT solution.
Profile Image for Rafael Matsumoto.
25 reviews
June 5, 2021
Really concise book with really good explanations on why logs (or a data structure that keeps appending events) are handy when it comes to developing distributed systems, databases and even organizations
Profile Image for Javier.
21 reviews12 followers
July 26, 2023
A very short read yet very insightful – without diving into details it manages to give a brilliant birds-eye view of what logs are and why they're important for modern distributed software architectures.

I think it's a fantastic introductory read to streaming architectures.
Profile Image for Matija.
93 reviews27 followers
June 7, 2018
Short and informative read on advantages of a distributed log as the centerpiece of the organizational ecosystem.
Profile Image for Vanessa.
57 reviews11 followers
January 14, 2019
Any engineer who works on distributed systems should understand what's in this book. Based on a blog post by the author. A bit old (2014), but still useful.
11 reviews
January 31, 2019
After reading bunch of Jay Kreps articles I had a feeling that I’ve already read that book before. It’s a good starting point to the topic of data systems integration and stream processing.
28 reviews6 followers
May 20, 2019
A very insightful coverage of how logs play a central role in distributed systems! A really good read.
Displaying 1 - 30 of 45 reviews

Can't find what you're looking for?

Get help and learn more about the design.