Jump to ratings and reviews
Rate this book

Streaming Data: Understanding the real-time pipeline

Rate this book
Streaming Data is an idea-rich tutorial that teaches you to think about efficiently interacting with fast-flowing data. Through relevant examples and illustrated use cases, you'll explore designs for applications that read, analyze, share, and store streaming data. Along the way, you'll discover the roles of key technologies like Spark, Storm, Kafka, Flink, RabbitMQ, and more. This book offers the perfect balance between big-picture thinking and implementation details.

216 pages, Paperback

First published June 22, 2017

9 people are currently reading
143 people want to read

About the author

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
8 (11%)
4 stars
28 (39%)
3 stars
27 (38%)
2 stars
7 (9%)
1 star
1 (1%)
Displaying 1 - 14 of 14 reviews
Profile Image for Alexander.
7 reviews
July 9, 2017
A nice overview of various challenges to be solved when implementing a streaming data analytics system. Being new to this topic, I found a lot of useful high-level terms explained, as well as some interesting data processing algorithms mentioned (I finally understood the idea of a Bloom Filter).

With that said, I didn't particularly like the writing style itself, where same thoughts would be repeated over and over again, diagrams were quite simplistic, and the sheer number of typos through the book was barely acceptable for a final release from such a great publisher as Manning (normally their books are of a superb quality).

I can recommend this for people interested in an introduction to the topic of streaming data systems, it can be very helpful (as it was for me) when you don't yet have any hands-on experience with Spark, Kafka etc.
Profile Image for Philip.
16 reviews2 followers
November 13, 2018
It covers the basics of streaming data. Don't expect too much deep talk on tools. Instead this book covers the techniques around streaming data.
Profile Image for Zbyszek Sokolowski.
299 reviews16 followers
October 29, 2017
I found subtitle 'Understanding the real-time pipeline" very accurate. This is short book with overview how to deal with streaming data.
Book is very good starter book for the topic. It is only 216 pages. It shows different perspectives and what we encounter in real life designing data stream digesting analyzing application. Based on example it presents architecture of streaming pipeline. Informs what can be encountered during whole process in positive situation or when we are in trouble because one of our components went down and how prepare our architecture in any case of failure. Author shortly compares different solutions like for e.g. Spark, Storm, Kafka, Flink, shows briefly their pros and cons and what is missing to use certain tool. The same about different databases and in-memory caches. Helps to distinguish between technologies showing their pros and cons. Also explains algorithms which can be used when data are need to be analysed Bloom filter, HyperLogLog and Count-Min Sketch.

All in all the book should be valuable for people who are interested in architecture, o they want to improve their understanding or maybe existing approach.

The big plus is a lot of references to external sources either books or articles with links. I found the book to be helpful.

The disadvantage is that I would gladly find much more about reactive systems and sometimes content more clearly written.

In the end is concise example using Kafka, Storm, Netty
Profile Image for Mikhail Sergeev.
3 reviews
May 30, 2018
Интересная обзорная книга.
Иногда казалось, что некоторые главы излагают очевидные вещи, и казались скучными. Но видимо это сильно зависит от подготовки читателя (я тоже не спец по потоковой обработки, но сталкиваться с проблемами в духе "как не потерять сообщения" мне приходилось).

Глава про алгоритмы очень понравилась. В книге есть несколько примеров вероятностных алгоритмов вроде HyperLogLog, Count Min Sketch и пр. Было очень интересно узнать о существовании таких подходов.

Покупал книгу в надежде, что это поможет мне лучше понять работу Kafka и RabbitMQ. Но все обзоры конкретных технологий очень небольшие, по паре абзацев. Взамен в каждом обзоре есть рекомендации полезных книг по данной тематике.
5 reviews1 follower
April 29, 2018
A very useful starter for a complete beginner. The author makes a great effort explaining complex things in a simple way. However, sometimes the material gets convoluted a bit and not transparent, like in 5-3. Summarization techniques, particularly about the reservoir sampling, bit-pattern-based algorithms. On another hand, my not readiness to invest more time into learning these topics elsewhere is the likely reason. Evidently, this is a book-overview and it is quite expected of it to serve as a shortcut for more advanced materials on the topics. I didn't touch the code, just read the book. Now it's my valued reference point.
Profile Image for Tim Verstraete.
314 reviews3 followers
October 22, 2017
I really liked this book ... at first I didn't understand the goal but then it became more clear: decent syntax for use when designing streaming data applications and using the default template architecture to never miss anything + the very good example as final chapter. Nice!
Profile Image for Alexey.
15 reviews
June 26, 2019
Очень очень очень поверхнотный обзор построение конвееров для процессинга данных. Есть примеры на java. Было интересно для расширения кругозора. Узнал как и где применяют Kafka, Storm и прочие продукты Apache Fondation.
Profile Image for Zohreh Jafari.
69 reviews5 followers
October 20, 2021
Who might like this book:
- Already a (data-driven) developer and curious about 'Streaming Data'
- Familiar with IT infrastructure or already have a job as member of infrastructure team
-A data Scientist
- A spark developer still not hands on Spark streaming
Profile Image for Josh.
8 reviews1 follower
November 8, 2021
Graphics are really redundant and the book is simply not well written :(
Profile Image for Christophe Addinquy.
390 reviews19 followers
September 15, 2018
It's a real good surprise. This is not a "hard code" book, but instead a text focus on understanding what RT Big Data means. The text is well writen and illustrated and also (not so common here) provides reference to research papers. The author regularly repeat himself, but not up to the point to become annoying. For my part, I consider it an equivalent of the Martin Follower's "distilled" titles, which is quite something.
Ma note de lecture en Français ici
61 reviews2 followers
August 31, 2018

Summary

Streaming Data introduces the concepts and requirements of streaming and real-time data systems. The book is an idea-rich tutorial that teaches you to think about how to efficiently interact with fast-flowing data.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the Technology

As humans, we're constantly filtering and deciphering the information streaming toward us. In the same way, streaming data applications can accomplish amazing tasks like reading live location data to recommend nearby services, tracking faults with machinery in real time, and sending digital receipts before your customers leave the shop. Recent advances in streaming data technology and techniques make it possible for any developer to build these applications if they have the right mindset. This book will let you join them.

About the Book

Streaming Data is an idea-rich tutorial that teaches you to think about efficiently interacting with fast-flowing data. Through relevant examples and illustrated use cases, you'll explore designs for applications that read, analyze, share, and store streaming data. Along the way, you'll discover the roles of key technologies like Spark, Storm, Kafka, Flink, RabbitMQ, and more. This book offers the perfect balance between big-picture thinking and implementation details.

What's Inside


The right way to collect real-time data
Architecting a streaming pipeline
Analyzing the data
Which technologies to use and when

About the Reader

Written for developers familiar with relational database concepts. No experience with streaming or real-time applications required.

About the Author

Andrew Psaltis is a software engineer focused on massively scalable real-time analytics.

Table of Contents

PART 1 - A NEW HOLISTIC APPROACH
Introducing streaming data
Getting data from clients: data ingestion
Transporting the data from collection tier: decoupling the data pipeline
Analyzing streaming data
Algorithms for data analysis
Storing the analyzed or collected data
Making the data available
Consumer device capabilities and limitations accessing the data
PART 2 - TAKING IT REAL WORLD
Analyzing Meetup RSVPs in real time



















**

Displaying 1 - 14 of 14 reviews

Can't find what you're looking for?

Get help and learn more about the design.