Learn how to use the Apache Hadoop projects, including MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout, and Apache Solr. From setting up the environment to running sample applications each chapter in this book is a practical tutorial on using an Apache Hadoop ecosystem project. While several books on Apache Hadoop are available, most are based on the main projects, MapReduce and HDFS, and none discusses the other Apache Hadoop ecosystem projects and how they all work together as a cohesive big data development platform. What You Will
Who This Book Is Apache Hadoop developers. Pre-requisite knowledge of Linux and some knowledge of Hadoop is required.
Nowadays, it is becoming increasingly difficult to predict how data will grow. Notwithstanding, like clockwork, innovation is in a time of vulnerability. It shows that we are on the cusp of the next "awesome" thing. Often, we find that it is a popular fashion that is soon supplanted by the next glossy trinket. The expectations of technical experts as to what will come next in coding perceptively places Hadoop as the “the next great thing”. Deepak Vohra’s Practical Hadoop Ecosystem provides answers to various questions of what experts could use Hadoop to do.
This book is a useful guide on utilising the Apache Hadoop functionalities, including MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout and Apache Solr. It emphasises the setting up of the working context, to running various example applications in every part. It is pragmatic, instructional and presents exercises on utilising Apache Hadoop ecosystem ventures. While a few books on Apache Hadoop are available, most focus on the primary functions and HDFS, and none discuss the rest of the Apache Hadoop ecosystem and how these integrate to form a robust big data improvement platform.
The author uses the book to present multiple learning platforms for readers, although I doubt the content could attract the attention of non-technical readers. Some of the learning functions covered are: how to set up an environment in Linux for Hadoop projects using Cloudera Hadoop Distribution CDH 5; how to run a MapReduce job; how to store data with Apache Hive and Apache HBase; how to index data in HDFS with Apache Solr; how to develop a Kafka messaging system; step by step instructions to enhance a Mahout User Recommender System; step by step instructions to stream Logs to HDFS with Apache Flume; step by step instructions to exchange information from MySQL database to Hive, HDFS and HBase with Sqoop; and, how to make a Hive table over Apache Solr. The ideas on why Hadoop is essential for web-scale data processing and storage are well presented.
Overall, it is a book with an extensive implementation to show why Hadoop could be a big thing, a tool of exemplary impact in Distributed Computing Systems. The book also provides examples throughout the chapters. It explains challenging concepts and is aimed at coders, software developers, programmers, and technical reviewers - it is a book for experts and mentors. Recommended as a useful guide for Hadoop’s learners and experts.