CDH3 update 2 is released

Categories: General

Continuing with our practice from Cloudera’s Distribution Including Apache Hadoop v2 (CDH2), our goal is to provide regular (quarterly), predictable updates to the generally available release of our open source distribution.  For CDH3 the second such update is available today, approximately 3 months after update 1.

For those of you who are recent Cloudera users, here is a refresh on our update policy:

  • We will only include patches in updates that are non-compatibility breaking.
  • We will only include patches in updates that are non-disruptive.
  • You can skip updates without penalty – i.e., if you don’t find the contents of an update compelling, you can skip it and wait for a future update without having to do a delta upgrade.
  • When it’s possible to pull features from our CDH4 roadmap into CDH3 updates in a non-disruptive way, we’ll take advantage of that opportunity.

There are a number of improvements coming to CDH3 with update 2.  Among them are:

1. New features – Support for Apache Mahout (0.5).   Apache Mahout is a popular machine learning library that makes it easier for users to perform analyses like collaborative filtering and k-means clustering on Hadoop.   Also added in update 2 is expanded support for Apache Avro’s data file format.  Users can:

  • load data into Avro data files in Hadoop via Sqoop or Flume
  • run MapReduce, Pig or Hive workloads on Avro data files
  • view the contents of Avro files from the Hue web client

This gives users the ability to use all the major features of the Hadoop stack without having to switch file formats or default to text.  The Avro file format provides added benefits over text because it is faster and more compact.

2. Improvements (stability and performance) – HBase in particular has received a number of improvements that improve stability and recoverability.  All HBase users are encouraged to use update 2.

3. Bug fixes – 50+ bug fixes.  The enumerated fixes and their corresponding Apache project jiras are provided in the release notes.

Update 2 is available in all the usual formats (RHEL, SLES, Ubuntu, Debian packages, tarballs, and SCM Express).  Check out the installation docs for instructions. If you’re running components from the Cloudera Management Suite they will not be impacted by moving to update 2. The next update (update 3) for CDH3 is planned for January, 2012.

Thank you for supporting Apache Hadoop and thank you for supporting Cloudera.


4 responses on “CDH3 update 2 is released