Introducing CDH4

Categories: CDH General

I’m pleased to inform our users and customers that Cloudera has released its 4th version of Cloudera’s Distribution Including Apache Hadoop (CDH) into beta today. This release combines the input from our enterprise customers, partners and users with the hard work of Cloudera engineering and the larger Apache open source community to create what we believe is a compelling advance for this widely adopted platform.

There are a great many improvements and new capabilities in CDH4 compared to CDH3. Here is a high level list of what’s available for you to test in this first beta release:

  • Availability – a high availability namenode, better job isolation, hard drive failure handling, and multi-version support
  • Utilization – multiple namespaces, co-processors and a slot-less resource management model
  • Performance – improvements in HBase, HDFS, MapReduce and compression performance
  • Usability – broader BI support, expanded API access, unified file formats & compression codecs
  • Security – scheduler ACL’s

Some items of note about this beta:

This is the first beta for CDH4.  We plan to do a second beta some weeks after the first beta.  The second beta will roll in updates to Apache Flume, Apache Sqoop, Hue, Apache Oozie and Apache Whirr that did not make the first beta.  It will also broaden the platform support back out to our normal release matrix of Red Hat, Centos, Suse, Ubuntu and Debian. Our plan is for this second beta to have the last significant component changes before CDH goes GA.

Some CDH components are getting substantial revamps and we have transition plans for these. There is a significantly redesigned MapReduce (aka MR2) with a similar API to the old MapReduce but with new daemons, user interface and more. MR2 is part of CDH4, but we also decided it makes sense to ship with the MapReduce from CDH3 which is widely used, thoroughly debugged and stable. We will support both generations of MapReduce for the life of CDH4, which will allow customers and users to take advantage of all of the new CDH4 features while making the transition to the new MapReduce in a timeframe that makes sense for them.

Because of the anticipated popularity of the high availability features, we’ve created a high availability guideAll of the other documentation artifacts have been updated. As always, we maintain complete transparency as to the Apache project releases and patches that make up CDH4. You can find the documentation for the Apache contents of CDH4 here.

We value your feedback! Please help make this beta a success by trying out CDH4 b1 and letting us know what you think.  If you are a customer, you should give us your feedback via Zendesk. If you are a user but not a customer, please give us your feedback on CDH Users.


One response on “Introducing CDH4