Introducing CDH4 Beta 2

Categories: CDH General

I’m pleased to inform our users and customers that we have released the Cloudera’s Distribution Including Apache Hadoop version 4 (CDH4) 2nd and final beta today. We received great feedback from the community from the first beta and this release incorporates that feedback as well as a number of new enhancements.

CDH4 has a great many enhancements compared to CDH3.

  • Availability – a high availability namenode, better job isolation, improved hard disk failure handling, and multi-version support
  • Utilization – multiple namespaces and a slot-less resource management model
  • Performance – improvements in HBase, HDFS, MapReduce, Flume and compression performance
  • Usability – broader BI support, expanded API options, a more responsive Hue with broader browser support
  • Extensibility – HBase co-processors enable developers to create new kinds of real-time big data applications, the new MapReduce resource management model enables developers to run new data processing paradigms on the same cluster resources and storage
  • Security – HBase table & column level security and Zookeeper authentication support

Some items of note about this beta:

This is the second (and final) beta for CDH4, and this version has all of the major component changes that we’ve planned to incorporate before the platform goes GA.  The second beta:

  • Incorporates the Apache Flume, Hue, Apache Oozie and Apache Whirr components that did not make the first beta
  • Broadens the platform support back out to our normal release matrix of Red Hat, CentOS, SUSE, Ubuntu and Debian
  • Standardizes our release matrix of supported databases to include MySQL, PostgresSQL and Oracle
  • Includes a number of improvements to existing components like adding auto-failover support to HDFS’s high availability feature and adding multi-homing support to HDFS and MapReduce
  • Incorporates a number of fixes that were identified during the first beta period like removing a HBase performance regression

To recap, some CDH components have undergone substantial revamps and we have transition plans for these. There is a significantly redesigned MapReduce (aka MR2) with a similar API to the old MapReduce but with new daemons, user interface and more. MR2 is part of CDH4, but we also decided it makes sense to ship with the MapReduce from CDH3 (aka MR1) which is widely used, thoroughly debugged and stable. We will support both generations of MapReduce for the life of CDH4, which will allow customers and users to take advantage of all of the new CDH4 features while making the transition to the new MapReduce in a timeframe that makes sense for them. Similarly, Apache Flume in CDH4 is substantially revamped (aka Flume NG).  The new design is simpler, more scaleable, more manageable and more reliable.

Because of the popularity of the high availability features, we’ve created a high availability guide.  All of the other documentation artifacts have been updated. As always, we maintain complete transparency as to the Apache project releases and patches that make up CDH4. You can find the documentation for the Apache contents of CDH4 here.

We value your feedback! Please help make this beta a success by trying out CDH4 b2 and letting us know what you think.  If you are a customer, you should give us your feedback via Zendesk. If you are a user but not a customer, please give us your feedback on CDH Users.


6 responses on “Introducing CDH4 Beta 2

  1. Marcos Ortiz

    Excellent news, Charles, I’m very excited for this release. This will be posted on my blog all the new features on this release. Keep it in that way. Regards and best wishes.