Announcing Cloudera’s Distribution for Apache Hadoop

Categories: Community General Hadoop

One of the repeating themes we have heard while working with our customers and the community is that Apache Hadoop configuration and deployment is a pain. Often times, Hadoop is the first truly distributed system that administrators encounter, and the problem is made worse by the lack of standardized packages and deployment tools. And some releases are buggy. And upgrades are hard. And the list goes on.

In order for Hadoop to truly disrupt the enterprise, it needs to be just as easy to configure, deploy and manage as any other piece of software.

We’d like to take a step in that direction and share our distribution with the community. We developed our distribution to improve reliability and operations for our support customers, and while they will always be the first to receive updates and hot fixes, the community will never be far behind.

You can expect additional features over time, but I’m happy to say that the first release includes:

  • RPM Deployment – Never again wonder which files go in which directories and if your component versions are compatible. RPM was designed for this. In addition to Hadoop, we have RPMs for compatible versions of  Hive and Pig in this release.
  • Standard Linux Service Management – Your IT staff knows how to work with RPMs and init level services. Now they know how to work with Hadoop.
  • Public YUM Repository – We’ll make sure it’s easy to stay up to date with the latest stable version of Hadoop.
  • Simple Web Based Configuration Assistance – Do you know what the optimal setting for mapred.child.ulimit is? another.arcane.parameter? Well, we have some ideas, and are always learning more. To share that, we’ve created a configurator that asks a few important questions about your hardware and computes sensible values for all of your configuration parameters.

Our distribution is currently based on Hadoop 0.18.3, and all of our changes are Apache 2.0 licensed. We’ve patched in some stable features from later versions and included some patches we are still in the process of committing to Apache.

Check it out:


9 responses on “Announcing Cloudera’s Distribution for Apache Hadoop

  1. seymourz

    Thanks. Also want to know how soon or if you have plan to include HBase or Hypertable into your distribution.

  2. schubertzhang

    yeah, the web based configuration and deployment tool is very useful. Now we need not develop ourselive or have a good reference for our development. Thanks.

  3. Pingback: Marc’s Voice » Blog Archive » More strategic thinking

  4. Bill Au

    I am looking to use the Cloudera distribution of Hadoop since it contains patches that I need. I am using the source rpm:
    since I need to apply a few additional patches as well.

    I discovered that one of the Cloudera patches install the file webapps/static/style.css. In it there are references to a few images that are not part of the standard Hadoop release but are also not in the Cloudera distribution. These missing images are all in static/images which is also not in the standard Hadoop release:

    Do I need to worry about these missing images?

  5. Matthew Sacks

    I used Cloudera Hadoop for writing a simple “getting started” tutorial. Using the CDH distribution definitely made things a lot easier. So nice work on the public yum repo, I wish more OSS companies provided one.