Keeping with our release policy for Cloudera’s Distribution Including Apache Hadoop (CDH) I’m pleased to announce the availability of update 3 for CDH3. As a reminder, we ship updates for our most recent GA distribution every 3 months. Updates primarily include bug fixes but when possible we will also include features from our mid-term roadmap. We’ll only include new features when they do not introduce instability or break compatibility. As always, users have the option to skip updates without incurring any future upgrade cost.
Update 3 contains a number of new improvements. Several improvements positively impact performance. Enhancements were made to HDFS and to HBase which will result in 15-150% improvements in performance compared to CDH3 update 2 depending on the workload. Users should see performance gains in a wide range of workloads from MapReduce over HDFS style workloads to HBase scan style workloads to HBase random read / write workloads. Todd Lipcon’s talk at Hadoop World on performance outlines a number of these improvements that have made it to update 3. Some of these performance improvements require users to select specific configuration settings so please consult the documentation.
A number of other improvements have been made that will help system stability and recover-ability. Enhancements were made to MapReduce to better work around disk failures without impacting task locality. We also backported the Apache HBase distributed log splitting feature that will make recovery from region failure much faster than it has been previously.
More information about how to download or upgrade to update 3 is available here. Additional details are available in the release notes. As always, the exact changes (jiras, patches) for update 3 are described in the changes files that can be found here.