Jump to ratings and reviews
Rate this book

Mahout in Action

Rate this book

When computers harness prior experience to improve future performance, a type of artificial intelligence called machine learning has been applied. The Apache Mahout project is focused on three types of machine learning that are of particular interest to modern web developers "recommendation systems, classification, and clustering.



Through real-world examples, Mahout in Action introduces the sorts of problems that these techniques are appropriate for, and then illustrates how Mahout can be applied to solve them. It places particular focus on issues of scalability, and how to apply these techniques at very large scale with the Apache Hadoop framework.

415 pages, Paperback

First published October 5, 2011

15 people are currently reading
85 people want to read

About the author

Sean Owen

51 books2 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
17 (18%)
4 stars
39 (43%)
3 stars
23 (25%)
2 stars
7 (7%)
1 star
4 (4%)
Displaying 1 - 5 of 5 reviews
Profile Image for Alex Ott.
Author 3 books208 followers
June 26, 2011
This book doesn't provide deep coverage of theoretical foundations of machine learning (I would recommend to look to other books, like "Introduction to Machine Learning (Adaptive Computation and Machine Learning series)", "Machine Learning in Action" or "Programming Collective Intelligence: Building Smart Web 2.0 Applications", etc., if you want to get more background), but concentrates on explanation on how to use Apache Mahout to solve some of machine learning problems: making recommendations, data clustering and classification.

For each of class of these problems, description starts with base things, and continues with more complex examples, including complete solutions, that could be easily adapted for your machine learning problems. All examples that come with book were checked with actual release of Apache Mahout (version 0.5).

Book is written in succinct, but understandable language and provides many code snippets that make understanding of topics much easier. Interesting solution in e-book version of Mahout in Action, is inclusion of audio and video snippets, that explains and/or show "hard places". There is also interesting description of one of Mahout's deployments in real world, where it's used in e-commerce.

So I recommend this book if you're interested in solving machine learning problems that works with very large data sets.
4 reviews7 followers
June 23, 2011
Apache Mahout is an Open Source scalable Machine Learning library in Java. It is designed to handle large data set. More than a dozen of Machine Learning and Data Mining algorithms are available in Mahout. All those algorithms are implemented on top of Apache Hadoop. The framework is distributed under a commercially friendly Apache License. It helps researchers and corporate to build scalable and practical products based on Machine Learning and Data Mining Principles. A wide range of big companies as well as startups are using Apache Mahout in their products.

The Apache Mahout project is focused three interesting Machine Learning problems 1) recommendation systems 2) clustering and 3) classification. The project address real world practical problems. The tool makes life of Machine Learning Developers much enjoyable. The book "Mahout in Action" by Sean Owen,Robin Anil, Ted Dunning and Ellen Friedman introduces the wonder world of creating scalable and real world machine learning projects with Apache Mahout. It is written in a lucid language so that a beginner in Machine Learning can understand the concepts and kick-start working with classification, clustering or recommendation projects. Even though the detailed algorithmic back ground of underlying algorithms in Mahout is not described the logic (common sense) behind the system is explained very well with help of code examples and practical projects. I am giving chapter wise overview of the book "Mahout in Action" below. A sample chapter is available for download at http://www.manning.com/free/green_owe...


Chapter 1 of the book get you introduced to Mahout. Through this chapter you get to know the history of Mahout project, algorithms, it's capabilities and configurations.

Chapter 2 of the book introduces recommendation systems to the reader. The chapter teaches how to build a basic re commender systems with Apache Mahout. The examples given for narrating the technique is very clear and understandable to all.

Chapter 3 of the book discuss about data representation for building a recommender engine. The discussions in this chapter extends up to some naive data structure in Mahout. There is some discussion on using MySQL for storing data for building recommender engines.

Chapter 4 of the book gives more insight in to building scalable recommender systems. It introduces user based recommendation engines as well as item based recommendation engines. The examples are very clear and it helps practitioners to build better prototypes much faster. The chapter is written in such a lucid way that any body can understand the common sense behind the recommender engines.

The fifth chapter of the book deals with producing a full-fledged recommender system with Apache Mahout. The discussion and examples in this chapter extends up to deploying a web based recommeder engine. Once u covered up this chapter it can be ensured that you can build a good production quality recommender engine for your client.

Chapter 6 of the book discussed how to build a scalable and distributed recommendation system with Mahout and Hadoop frame-work. The chapter gives illustrative example for the task with Wikipedia data set. The author spent some pages for explaining Map Reduce concept in a much lucid way. There is a discussion on running the recommender in a cloud platform too. This chapter is definitely a helping point for professionals to kick-start their recommender projects with less pain.

Starting from chapter 7 to 12 the book discusses about Clustering techniques using Apache Mahout. Chapter seven gives a brief introduction to clustering with practical examples. The chapter contains discussions on different clustering algorithms available in Mahout.

Chapter eight of the book deals with preparing and representing data for clustering task. Tips and tricks for converting raw data to vectors for clustering is discussed in a very lucid manner in this chapter.

The 9th chapter of the book discusses details on clustering algorithms in Mahout. The major algorithms covered in this chapter are K-Means clustering, Centroid generation using Canopy clustering, Fuzzy K-Means clustering, Dirichlet clustering,Topic modeling using LDA as a variant of clustering. There is a small cases study on clustering news items using Apache Mahout. One of my project student has undertaken such a project for his MSc in CS .

The 10th chapter is focused on evaluation of clustering system. The chapter discusses about clustering output inspection, quality evaluation of clustering and improving the quality of clusters.

The 11th chapter deals with producing a scalable clustering system with Mahout. It gives good insight in to the art of content clustering with two case studies. The 12th chapter discusses some use cased of clustering with code examples including twitter user clustering, playing with last.fm data and clustering.

Beginning from chapter 13 to end of chapter 16 the book discusses about the technique of classification. Chapter 13 of the book gives the introduction to classification. It explains classification step by step with examples.The illustrations given in the chapter makes the content more enjoyable and understanding for the reader. Chapter 14 deals with training a classifier system. It explains the task of training with a publically available data-set called 20 newsgroups data set. There is a discussion on selecting algorithm for the classification task too. When ever I came to know about Mahout I used the classification techniques and algorithms. Chapter 16 has a wonderful discussion on deployment of classification system. The section gives practical insight on pros and cons of developing and deploying scalable classification system that can be bench marked with existing best performing systems.

The 17th Chapter needs special mention. The chapter is a case study named "Case study: Shop It To Me". The discussions in this chapter shows real power of Apache Mahout with the help of a practical project.

There are two appendix provided to the book. Appendix A deals with some JVM tuning tips and tricks for Deploying Hadoop/Mahout based projects. It is even useful for core Java programmers too. The Appendix B gives insight on "Mahout Math" and some deep math related stuff in Mahout.

The book is available from Manning MEAP site. Three excerpts are available in the web site along with sample code. This is a must-read for all Machine Learning and NLP Developers and Researchers. This is an excellent book and I am very much happy to read practice and understand the Apache Mahout in such detail. Kudos to Sean Owen,Robin Anil, Ted Dunning and Ellen Friedman.

For code samples and sample chapters visit http://www.manning.com/free/green_owe...
Profile Image for Todd N.
360 reviews260 followers
November 24, 2012
Read this very quickly in preparation for a meeting. Because the Apache Mahout documentation is bad, even by open source standards, this is the best source for learning Mahout. It contains a very good overview of item-based and user-based recommendation engines if you have ever wondered how these things work. Classification and clustering are also covered nicely.

Obviously the software has continued to be developed since this book was published, so it is dated. There are a section explaining how to run it on a Hadoop cluster, but you are better off finding a tutorial on a blog somewhere for a better walk through.
26 reviews11 followers
February 20, 2012
Reading this book helps to get up to speed with Apache Mahout project, especially that at the moment online documentation for Apache Mahout is quite poor and incomplete.
Profile Image for Andrey Tatarinov.
5 reviews
January 6, 2013
Very long and not very informative Mahout quickstart. Could be just as easily be in form of a wiki article with JavaDoc's references.
Displaying 1 - 5 of 5 reviews

Can't find what you're looking for?

Get help and learn more about the design.