I was really excited for this book. I work at a company where my team deals entirely with streaming systems, and it's been quite a mindshift for me as I'm not used to the streaming mentality. I was hoping this book would help me understand a lot of these concepts and when I saw the announcement I purchased the book right away and even added reading it and reporting what I learned to my team as a quarterly personal goal at work.
I have to say, however, I wound up extremely disappointed. The book deals with a lot of interesting concepts and tries its best to cover them, but one of the critical flaws with the book is the way information is presented, particularly graphics. A good portion of the book is based on two blog posts by the authors, which contained animated graphics. These graphics were translated into static images for the book and some meaning was lost in translation. I found much of the book somewhat difficult to follow, and wound up confused fairly often.
The other major issue with the book is that it says right on page 26 that this is not a book on Apache Beam but, spoiler alert, yes it is. The entirety of Part I is called 'The Beam Model', the authors have worked on Beam in some capacity, and 100% of the code examples use Beam. The authors claim they use Beam not because it's a Beam book, but because Beam most closely illustrates the concepts in the book. However, the matching is so close to 1:1 between what is being taught and what Beam does that the code examples show almost nothing. Virtually every code snippet is just calling a series of factory methods on some kind of object which match the names in the book, almost everything is a one-liner because Beam makes it so easy. This is all well and good if you want to use Beam, but if you want to actually understand what the code is doing in a way that helps underline the concepts of the book, you're out of luck. The Beam code is basically pure magic, you declare the configuration from the book and you're done. This doesn't help solidify the material at all. Here's an example:
PCollection> totals = input
.apply(Window.into(FixedWindows.of(TWO_MINUTES))
.triggering(AfterWatermark()
.withEarlyFirings(AlignedDelay(ONE_MINUTE))
.withLateFirings(AfterCount(1))))
.apply(Sum.integersPerKey());
That's the entire code snippet on page 45 and it perfectly sets up the streaming system as described on the 5 full pages that precede it. As a reader, you're left with the sense that you don't know how you'd implement anything in the book _without_ Beam. Thus, it's a Beam book.
After The Beam Model, you've got a section of "Streams and Tables" which actually provides really good insight into the concepts of data at rest (tables) and data in motion (streams). I really liked this way of thinking about data, and chapter 6 and 7 made me think the book may have turned a corner... but then chapters 8 and 9 (comprising 82 of the book's 318 pages) are about what the author's wish you could do with SQL, but can't. No, really, the opening to Chapter 8 states outright "I want to point out up front that most of what we’ll discuss in this chapter is still purely hypothetical as of the time of writing." it's basically the author's wish list. Chapter 10 is a history lesson on the various libraries that are out there to help build streaming systems, most importantly Beam.
I wound up having to cancel my quarterly goal of reading and presenting this book because frankly I got virtually nothing out of it. Considering my excitement, this was a disappointing outcome to say the least. I'm not sure I'd recommend it to anyone unless they were basically looking for a guide to Beam.