This book is written in a concise and easy-to-understand manner, and acts as a comprehensive guide on data analytics and integration with Talend big data processing jobs. If you are a chief information officer, enterprise architect, data architect, data scientist, software developer, software engineer, or a data analyst who is familiar with data processing projects and who wants to use Talend to get your first big data job executed in a reliable, quick, and graphical way, then Talend for Big Data is perfect for you.
Talend for Big Data means exactly it! One of the shortest technical books I read, but sure to the point.
This book does not spend your time unwisely, if you happened to suddenly find yourself on a project involving Hadoop (or its ecosystem components) and you know at least some Talend (if not, I recommend a supplementary book that I also reviewed, Talend Open Studio Cookbook by Packt, too) then this is your book. Print it (if you got an eBook) and place a copy by your desk.
The book nicely covers what I feared complexities of dealing with Hadoop as Hive and Pig (a MR generator, not an animal), which actually turned out to be not true, thanks Talend and its 500+ components that cover 90% of what you need out of Big Data is already there for you to use. To my disbelief Talend actually is a very mature and (in paid variant) fully enterprise ready ETL solution.
The book has 7 chapters, each dedicated to a specific goal that accomplishes an exercise with a particular technology piece. My favorite is #7: Big Data Architecture and Integration Patterns chapter. The last one, but this is the chapter where you get kind of awarded and start benefiting from the material you ingested. Chapter 6: Aggregate Data with Pig is alot of fun and showed me a new way of interacting with Pig. It turned to be also a much easier way. As a side note, I am in love with ETL, in general, I think it has the highest ROI out of all the enterprise tools, yet very much fun to work with and what is best - visually documenting! Chapter 2: Building your First Big Data Job is like your first swim in deep waters - intimidating, but rewarding, full of uncertainty, but excitement and unforgettable. All the less relevant topics as setting your training system up are shifted to the appendixes, but I recommend actually starting there if you are new to Cloudera's Hadoop (CDH) VM distribution and/or VMPlayer (served in role of your Virtual Machine). It seemed to me that a reader does not need ANY prior knowledge of neither Talend nor Hadoop to accomplish the tasks in the book.
One suggestion I have to the author is instead of basing the examples on MySQL which seems to be out of favor by the user community MariaDB is the equivalent substitute that with the release of version 10 going to capture a lot of attention. Another point is the Hadoop distribution preference, it seems that Hortonworks offers more bells and whistles, but it is a catchup game anyways.
It is a 5 out 5 stars book, thank you Bahaaldine and Packt!
This handbook has kept up with the increasing focus on Big Data technology and integration with the typical components of Open source. There are many strengths associated with this text but as per my experience, I have found some really good topics like chapter 3: Formatting data (sentimental analysis), Chapter 4: Processing tweets with apache hive (extracting hash tags and emoticons) will be the greatest wonder of this book. Another strength of this book is the resource list of images with images and Appendix section at the end of book chapters. Once students discover this book's usefulness, they consult it in conjunction with every big data related tasks, saving me time and encouraging them to participate more fully in their own learning for big data. The author, Bahhaldine Azarmi, gives easy-to-understand explanations to describe what can be possible with big data and related technologies like database, application servers or web servers. I recommend to this book to anyone who wants to learn more on big data and its terminologies. The book is also useful due to the fact that examples are presented from a variety of real time levels But be warned, you'll be left with more questions! You'll be ready to start your own search for other big data explanations and integrations about what you see happening all around you. This book does have a few drawbacks. First, for those not familiar with technology or for those yet unfamiliar with some of the main challenges brought up by big data learners, some of the writing in this book may seem foreign at first. The incorporation of terms from both the technology and Big data fields can make the reading difficult for some. Although it is apparent that the author wanted their book to be an accessible souce for non-experts, it sometimes falls short of this goal with the extended use of technical terms and wordy sentences. Overall, this book provides many insights into the use of technology to enable big data learners to acquire grip to their full capacity. And, for those unfamiliar with technology, this book presents them with basic information about the tools and technology that can make their understanding not only a technology specific, but a learning real-time scenarios as well.
Prior Hands-on on the Talend DI, ESB components and fair working knowledge of the Hadoop family of components like Hive, HDFS, Sqoop, Pig would be easier to get the real essence behind the motive of the book
· The book is comprehensive in explaining the steps required to get started with Talend & Hadoop components from ground-zero i..e procurement, setup, initial configuration etc.
· The Style of Picking up a single topic i.e. Hive and devising a fully functional working example is crisp and clear even if you are complete beginner
· Author has also spent effort in explaining the important Talend Terminologies like Context, Schema, overview of components, Talend Modules for an easy take-off
· The book starts with the brief level of explanation of the Hive, SQOOP, Pig, HDFC etc. and sets the ground before taking a deep dive into the real implementation
· The good thing I like about the book is that, the illustration of the implementation revolves around single scenario about the Tweet feed analysis & hash tags. This offloads the need to go over with the underlying use again and again & get on quickly to Talend and Hadoop
· The idea of highlighting the missing essential regex into the Hive and how to compensate and to have work-around for the same is a good helpful tip
Talend for Big Data explains to readers how to work with Talend's Big Data solutions. In seven easy-to-read chapters, you will be ready to take on the technology.
Hadoop and big data require some coding skills but Talend Open Studio for Big Data as well as all the Enterprise and Platform versions will ease the access to the technology for you as users are able to develop graphically their jobs. Moreover, Talend provides a powerful and versatile open source big data product that makes the job of working with big data technologies easy and helps drive and improve business performance, without the need for special knowledge or resources.
What is also interesting in Talend’s technology is that the big data product combines big data components for MapReduce 2.0 (YARN), Hadoop, HBase, Hive, HCatalog, Oozie, Sqoop and Pig into a unified open source environment so you can quickly load, extract, transform and process large and diverse data sets from disparate systems.
Overall what is enjoyable throughout the book are the screenshots added by the author and the examples that clearly illustrate concepts that can be complex to fully understand.
Being a newbie to Hadoop/Big Data World with sufficient Data Processing/DW&BI/Analytics world, This book delivers what it promised to deliver. A beautiful, simple step by step narration which helps technical readers easy to follow.
Few Points I learnt from this book:
- Difference between Talend Unified platform and Open Studio/Community Edition - Talend Big Data components - for HDFS, Hive and Pig etc. - Sample Big Data Architecture and it can augument the traditional DW environment.
As a newbie to this topic, I found the book easy to read and follow. The Kindle addition of the book as a linked index that is a great resource to get to the heart of a topic quickly. I think it is perfect for entry level students of this topic.