Обработка больших массивов данных с помощью традиционных СУБД может оказаться трудным делом. Apache Hadoop – это каркас для разработки приложений, предназначенных для выполнения в распределенном кластере, без применения SQL. Такие приложения прекрасно масштабируются и могут обрабатывать гигантские массивы данных. Если вам требуется произвести анализ данных, то Hadoop – как раз то, что надо. Прочитав эту книгу, вы познакомитесь с предметом и научитесь писать программы в стиле MapReduce. После нескольких простых примеров автор быстро переходит к вопросу об использовании Hadoop для решения более сложных задач анализа данных. Описываются рекомендованные приемы и паттерны проектирования, полезные при программировании для MapReduce. Для чтения книги требуется знание основ языка Java. Некоторое знакомство с математической статистикой поможет разобраться в более сложных примерах.
Good overview of the Hadoop core. I particularly liked the coverage of Hadoop Streaming, which is handy if (like me) you are approaching Hadoop without a lot of Java experience.
I prefer the O'Reilly book on Hadoop by Tom White to this book because the O'Reilly book goes into more depth and has better coverage of the related projects. However, Hadoop In Action is excellent as an extended tutorial on Hadoop with clear examples.
I also liked the way that you can register the book and get multiple electronic formats for the book.
Eski bir sürüm için ve çabucak fikir sahibi olmak için okunabilir. Benim de hedefim buydu. Yer yer kitaptaki şeyleri deneyemedim, çünkü sistemi apache'nin planladığı şekilde kurmayı beceremedim. Şimdi bir de kurulum üzerine bir kitap okumayı deneyeyim. Bu sistem üzerine ilginç ve kullanabileceğim teknikler var gerçekten de.
Read this book to find out what Hadoop is since we are going to start using it at work. I found the chapters the introductory chapters pretty useful in getting an idea of that Hadoop can do, and the chapter on how to manage a Hadoop cluster was helpful too.
Much of the book has to do with using MapReduce programming to work with the data that Hadoop will store. Since this was not my main interest I only skimmed some of that since I wouldn't be using it anyway. The book was very well written though.
When actually following through the processes he describes though to set up a hadoop cluster nothing seems to actually work the way that it is described in the book. It has been a frustrating experience setting this up and the book hasn't made the actual process any easier.
Lam explained the theory behind Hadoop very well. The environment settings and running examples are actually scattered on Chapter 1 and Chapter 2. This would not be the best way for the beginners. When you crossed Chapter 5, you would definitely very handful in Hadoop. This is one good point about this book. The US Patent example make me but uncomfortable.
The best book to get up and running with Hadoop. As a beginner I found Hadoop the Definitive Guide very intimidating so this book gave me a head start. Reading Definitive Guide after reading Hadoop in Action and digging in some Map Reduce code on CDH4 was easier.
Quite interesting to read even if you do not intend to use Hadoop on a daily basis. It describes the "map-reduce" way of thinking and some real-world usage examples. Hadoop extensions described (Pig Latin and Hive) are also particularly interesting.