About the Book: Programming Hive: Data Warehouse and QueryLanguage for Hadoop Need to move a relational database application to Hadoop?Thiscomprehensive guide introduces you to Apache Hive, Hadoopsdatawarehouse infrastructure. Youll quickly learn how to use HivesSQLdialect-HiveQL-to summarize, query, and analyze largedatasetsstored in Hadoops distributed filesystem. This example-driven guide shows you how to set up andconfigureHive in your environment, provides a detailed overview ofHadoopand MapReduce, and demonstrates how Hive works within theHadoopecosystem. Youll also find real-world case studies thatdescribehow companies have used Hive to solve unique problemsinvolvingpetabytes of data. Use Hive to create, alter, and drop databases, tables,views,functions, and indexes Customize data formats and storage options, from filestoexternal databases Load and extract data from tables-and use queries,grouping,filtering, joining, and other conventional querymethods Gain best practices for creating user definedfunctions(UDFs) Learn Hive patterns you should use and anti-patterns youshouldavoid Integrate Hive with other data processing programs Use storage handlers for NoSQL databases and otherdatastores Learn the pros and cons of running Hive on AmazonsElasticMapReduce Contents Chapter 1 Introduction Chapter 2 Getting Started Chapter 3 Data Types and File Formats Chapter 4 HiveQL: Data Definition Chapter 5 HiveQL: Data Manipulation Chapter 6 HiveQL: Queries Chapter 7 HiveQL: Views Chapter 8 HiveQL: Indexes Chapter 9 Schema Design Chapter 10 Tuning Chapter 11 Other File Formats and Compression Chapter 12 Developing Chapter 13 Functions Chapter 14 Streaming Chapter 15 Customizing Hive File and Record Formats Chapter 16 Hive Thrift Service Chapter 17 Storage Handlers and NoSQL Chapter 18 Security Chapter 19 Locking Chapter 20 Hive Integration with Oozie Chapter 21 Hive and Amazon Web Services (AWS) Chapter 22 HCatalog Chapter 23 Case Studies Glossary Appendix References Colop
This could have been a much better book had it not been for the apparent haste with which O'Reilly rushed it out the door before (really) doing a final edit. The book is riddled with typographical errors, my favorite being the "dangling" second paragraph of Chapter 17, "Storage Handlers and NoSQL", which ends with: "For example, a Hive query could be run that selects a data table that is backed by sequence files, however it could output" (no kidding).
The overall content is worthwhile, but you have been forewarned, it's not as well edited as other books from O'Reilly. Three stars, solely by content.
Really good book to get into Hive and dive deeper. The installation is somewhat outdated but mind you, this book is a few years old. And I'm on mac, which I think is still not officially supported. Trying to build something with hive is filled with uncertainty as I am never 100% sure if it fails because I'm not on Linux or because my queries are wrong. But still, great book to get into Hive. Can't wait for the second edition coming out early 2017.
Maybe 2.5 stars? Not as clear as other O'Reilly texts, and with a ton of mistakes, both in text and code snippets. Clearly a rush job. Still, it'll get you going in terms of being a *user* of Hive. If you want to be an administrator, I'd look to other sources - and make sure you have a solid Java background.