This eBook is an introduction to some of the key open source tools available for use in acquiring and manipulating data in Big Data projects. It focuses on the Hadoop environment, offering a hands-on guide to using Pig, Hive, Flume and Sqoop. Tutorials provide worked examples and there are end-of-chapter exercises in order for the reader to test their newly acquired skills.
Whilst aimed at newcomers to Hadoop, the tutorials are written for people with some existing understanding of data in general and relational databases in particular. However, there are pointers to online help for those completely new to databases.