Construct a robust end-to-end solution for analyzing and visualizing streaming data Real-time analytics is the hottest topic in data analytics today. In Real-Time Techniques to Analyze and Visualize Streaming Data, expert Byron Ellis teaches data analysts technologies to build an effective real-time analytics platform. This platform can then be used to make sense of the constantly changing data that is beginning to outpace traditional batch-based analysis platforms.The author is among a very few leading experts in the field. He has a prestigious background in research, development, analytics, real-time visualization, and Big Data streaming and is uniquely qualified to help you explore this revolutionary field. Moving from a description of the overall analytic architecture of real-time analytics to using specific tools to obtain targeted results, Real-Time Analytics leverages open source and modern commercial tools to construct robust, efficient systems that can provide real-time analysis in a cost-effective manner. The book deep discussion of streaming data systems and architectures Instructions for analyzing, storing, and delivering streaming data Tips on aggregating data and working with sets Information on data warehousing options and techniques Real-Time Analytics includes in-depth case studies for website analytics, Big Data, visualizing streaming and mobile data, and mining and visualizing operational data flows. The book's recipe layout lets readers quickly learn and implement different techniques. All of the code examples presented in the book, along with their related data sets, are available on the companion website.
On one hand, it may be the only book that truly touches the heart of the problems in RTA. What is more it decomposes it into "moving parts" to illustrate what the issues are really about. But on the other hand, the execution is not perfect, sometimes it's even irritating:
* author picks the particular technologies well, but he pretty much skips the overview of each of them - two / three paragraphs of intro & his jumps straight into code -> what's even more odd, this code usually makes sense, but it focuses on some particular elements of tech -> what makes it hard to crunch for people who haven't experienced this tech at all. I did (majority of them) but it was still very irritating for me. To be honest I believe that this book should be more about the concept & modeling the solutions (while analyzing the challenges), not about particular tech.
* the other big drawback affects last few chapters -> the mathematical model behind HLL, Bloom Filters, regression methods, etc. - the idea to put this knowledge in the book is absolutely awesome, the choice of topics is great as well, there are some nice examples, but the actual execution fails -> there are pretty much no graphs, images, illustrations for this part (at least in Kindle version). FAIL. It sometimes make it really hard to grasp some ideas quickly - it's very irritating.
Otherwise than that, it's a very solid book dedicated to the very interesting topic. What is more, there aren't many alternatives available -> when I've starting battling this topic about 1.5-2 yrs ago, I had to gather knowledge from multiple resources, which was very time-consuming.
It's mix&mash or technologies, algorithms and overview of the field. - begins with description of the stream processing as a new approach, providing methodology and architectural backbone - then lays down exact technologies to be used - at the end, digs into algorithmic challenges and approaches. And even provide insights into the math behind it.
To some degree, this books gives awesome and very broad grasp of stream-processing. Most books are either technology oriented or too general or narrowed to alrorithms and data structures. This book tries to describe the approach from different scales. Which is good. But execution is not impressive. And you have to google a lot to get understanding what's going on. Code samples are very cryptic and inconsistent. Math formulas deserve better formatting.
I can't say who should read this book. May be, those who are not new to the field, but still in the beginning of understanding.
A timely book on the different techniques that can be applied while collecting, processing, storing and modeling real-time data from sources like operational monitoring and social networks. It touches upon all the aspects in real-time data anlaytics from data collection to creating effective dashboards to real-time A/B testing using predictions based on linear/non-linear models like neural networks.
The author doesn't assume the readers to have prior knowledge on cutting edge real-time data management technologies, and that makes the book really good for beginners. A moderately experienced data engineer might find few things basic, but again, those things could be really useful for beginners.
Good introduction into analysis of streaming data, starting with acquiring it, perform analysis & storing data. Besides overview of concrete technologies (Kafka, Flume, Spark, Samza, databases, D3.js, etc.) it also includes big part about algorithms that could be applied to streaming data, like count-min, bloom filter, etc., including implementation examples & how to apply them for your tasks.