Making Big Data Real-World Use Cases and Examples, Practical Code, Detailed Solutions Large-scale data analysis is now vitally important to virtually every business. Mobile and social technologies are generating massive datasets; distributed cloud computing offers the resources to store and analyze them; and professionals have radically new technologies at their command, including NoSQL databases. Until now, however, most books on “Big Data” have been little more than business polemics or product catalogs. Data Just Right is It’s a completely practical and indispensable guide for every Big Data decision-maker, implementer, and strategist. Michael Manoochehri, a former Google engineer and data hacker, writes for professionals who need practical solutions that can be implemented with limited resources and time. Drawing on his extensive experience, he helps you focus on building applications, rather than infrastructure, because that’s where you can derive the most value. Manoochehri shows how to address each of today’s key Big Data use cases in a cost-effective way by combining technologies in hybrid solutions. You’ll find expert approaches to managing massive datasets, visualizing data, building data pipelines and dashboards, choosing tools for statistical analysis, and more. Throughout, the author demonstrates techniques using many of today’s leading data analysis tools, including Hadoop, Hive, Shark, R, Apache Pig, Mahout, and Google BigQuery. Coverage includes
Michael Manoochehri is an entrepreneur, writer, and optimist. With many years of experience working with enterprise, research, and non-profit organizations, his goal is to help make scalable data analytics more affordable and accessible. Michael has been a member of Google's Cloud Platform developer relations team, focusing on cloud computing and data developer products such as Google BigQuery. In addition, Michael has written for the tech blog ProgrammableWeb.com, has spent time in rural Uganda researching mobile phone use, and holds a master's degree in information management and systems from UC Berkeley's School of Information.
On one hand the technologies it covers it handles well, on the other it misses data technologies I care about like handling highly, and evolvably, connected graph-like data.
On one hand it gives strong advice, on the other it repeats it over and over (yes, I got it the tenth time - Hadoop is not the right tool for every job, and one should choose cloud-based computing resource by default). In the always-wise words of Peter Griffin "it insists upon itself".
On one hand I liked the code snippets as they're illustrative of differences, on the other they're neither detailed enough for an engineer nor accessible enough for the average manager.
On the whole I'd recommend this book... with disclaimers.
I didnt finish reading the book. I didn't like it after a first few chapters. It mainly provides overview about databases to store big data and available tools for processing. The comparisons are mainly qualitative. Some chapters have the title being irrelevant to its content, like chapter 3. There is no mention of what crowd-sourced data is, what to be collected.
Good book if you're looking for a general overview of trends in modern data science. You'll get a nice sneak peek into everything notable as book is really up-to-date, but unfortunately this overview will be very general. Some chapters really need some more love - overall concepts and usage of Mahout or R are nicely written, but instead of quite abstract code sample, author should bring some RL scenarios + processing examples.
I liked a lot this book. This should be required reading to anyone trying to understand technically the whole Big Data paradigm in terms of challenges, problems and solutions. What I loved about this book is that it starts by describing concepts and then shows examples and code to make it easier to understand the ideas. It not only talks about Haddop or Hive but also different technologies like Amazon DynamoDB, Elastic MapReduce and Google BigQuery. Strongly recommended.
A wide range of topics are covered, but this book barely skims the surface. It might be a good overview, but doesn't have enough depth to really get you started in any particular technology.