CDH3u2: Apache Mahout Integration

Categories: Community General Hadoop

Cloudera believes that the flexibility and power of Apache Mahout ( in conjunction with Hadoop is invaluable. Therefore, we have packaged the most recent stable release of Mahout (0.5) into CDH3u2, and we are very excited to work with the Mahout community becoming much more involved with the project as both Mahout & Hadoop continue to grow. You can test our CDH with Mahout integration by downloading our most recent release:

Why we are packing Mahout with Hadoop?

Machine learning is an entire field devoted to Information Retrieval, Statistics, Linear Algebra, Analysis of Algorithms, and many other subjects. This field allows us to examine things such as recommendation engines involving new friends, love interests, and new products. We can do incredibly advanced analysis around genetic sequencing and examination, distributed search and frequency pattern matching, as well mathematical analysis with vectors, matrices, and singular value decomposition (SVD).

Apache Mahout is an open source project that is a part of the Apache Software Foundation, devoted to Machine Learning. Mahout can operate on top of Hadoop, which allows the user to apply the concept of Machine Learning via a selection of algorithms in Mahout to distributed computing via Hadoop. Mahout packages popular machine learning algorithms such as:

  • Recommendation mining, takes users’ behavior and find items said specified user might like.
  • Clustering, takes e.g. text documents and groups them based on related document topics.
  • Classification, learns from existing categorized documents what specific category documents look like and is able to assign unlabeled documents to the appropriate category.
  • Frequent item set mining, takes a set of item groups (e.g. terms in a query session, shopping cart content) and identifies, which individual items typically appear together.

We are very excited to be working with the Apache Mahout community and highly encourage everyone who is using CDH currently to give Mahout a try! As always, we are open to any guests who would like to blog about their experience using Mahout with CDH.