To help you navigate the large number of new data tools available, this guide describes 60 of the most recent innovations, from NoSQL databases and MapReduce approaches to machine learning and visualization tools. Descriptions are based on first-hand experience with these tools in a production environment.
This handy glossary also includes a chapter of key terms that help define many of these tool categories:
NoSQL Databases--Document-oriented databases using a key/value interface rather than SQL
MapReduce--Tools that support distributed computing on large datasets
Storage--Technologies for storing data in a distributed way
Servers--Ways to rent computing power on remote machines
Processing--Tools for extracting valuable information from large datasets
Natural Language Processing--Methods for extracting information from human-created text
Machine Learning--Tools that automatically perform data analyses, based on results of a one-off analysis
Visualization--Applications that present meaningful data graphically
Acquisition--Techniques for cleaning up messy public data sources
Serialization--Methods to convert data structure or object state into a storable format
A nice, high level introduction to Big Data related tools/platforms for the developers or academic researchers in data science. Will be a good starting point to get up to speed with the available options.
I was kinda skeptical when I first had my hands on these 60 pages and, after getting through them (that did not take so much), I’m still kinda puzzled. I did not get confused by the content, no. I guess if you decide to read this book, you must know what to expect from it, else you will end up pretty much disgusted with both the money and time wasted.
Let’s make it clear: this book doesn’t teach you anything. After thirty minutes or so, when you reach the back cover, you will not have learned anything. But chances are you will open Iceweasel, or whatever your favorite browser is, and go search more information about some of the tools the author described.
And that’s the one and only aim of the author: to make you wanna know more about some specific application that you were not aware of. I did search something indeed, so that, in that sense, mission accomplished.
Now, would I suggest reading the book? Yea, why not. You can always find out someone is developing something that could be useful to you.
Would I suggest buying the book. No. For a couple of reasons: 1. It’s not worth the money. This book should not be a book, but rather some kind of weekly newsletter or better, O’Really should make sure that Amazon and any other book store, gives yo ua ocpy of this title whenever you purchase an IT book, to show you the latest technologies and tell you hey, we’ve got a book covering that subject! 2. The book is outdated already. It’s from 2011. Technology advances so fast that what was hot and cool four years ago now has probably been replaced by something else.
As usual, you can find more reviews on my personal blog: http://books.lostinmalloc.com. Feel free to pass by and share your thoughts!
This book is a summary of the most popular modern tools for dealing with big datasets. There are no detail descriptions of the presented tools but it is a good intro book.
Very brief introduction, the links might take as long to investigate. Good introduction, although will become outdated very fast (needlebase no longer exists, for example).