What’s New in CDH3b2: Core Hadoop

Categories: General

In this post I’ll cover some of the larger or more significant changes that have gone into core Hadoop in CDH3 beta 2.

The Hadoop in CDH3 is based on the latest Apache Hadoop core release – version 0.20.2 – which was released February 26th, 2010. Details of what changed in the Apache Hadoop dot release can be found in the release notes and change log. We’ve included hundreds of additional bug fixes, improvements and features atop the base Apache release. You can see these in the CDH3 beta 2 release notes and change log. The version (hadoop-0.20.2+320) indicates the base Apache Hadoop release version and the number of additional patches that have been applied to this release. The changes in CDH3 beta 2 are primarily focused on improving Hadoop’s internals as opposed to adding user-facing APIs.

The biggest addition to CDH3 beta 2 is the incorporation of the 0.20 append branch to enable HBase. The 0.20 append branch is a version of Hadoop 0.20 that supports the sync method to provide durability for the HBase edits log. See the this page on the HBase wiki and look out for an upcoming post by Todd Lipcon for more detail.

A handful of other key changes to HDFS are better handling of Data Node volume failure, better handling of Name Node replica failure, and stable performance in the face of heavy block deletion. We’ve also added a FUSE package for HDFS. The most notable addition to MapReduce is FIFO pool support in the fair share scheduler. Like CDH2 update 1, CDH3 now supports Ubuntu’s Lucid release.

We’re busy incorporating Hadoop security from the Yahoo! security branch for the next beta – CDH3 beta 3 – which we expect to release in the fall. CDH3 beta 3 will be the final beta before CDH3 is declared stable. Get in touch if there are other changes you’d like to see.