One of the repeating themes we have heard while working with our customers and the community is that Apache Hadoop configuration and deployment is a pain. Often times, Hadoop is the first truly distributed system that administrators encounter, and the problem is made worse by the lack of standardized packages and deployment tools. And some releases are buggy. And upgrades are hard. And the list goes on.
In order for Hadoop to truly disrupt the enterprise, it needs to be just as easy to configure, deploy and manage as any other piece of software.
We’d like to take a step in that direction and share our distribution with the community. We developed our distribution to improve reliability and operations for our support customers, and while they will always be the first to receive updates and hot fixes, the community will never be far behind.
You can expect additional features over time, but I’m happy to say that the first release includes:
- RPM Deployment – Never again wonder which files go in which directories and if your component versions are compatible. RPM was designed for this. In addition to Hadoop, we have RPMs for compatible versions of Hive and Pig in this release.
- Standard Linux Service Management – Your IT staff knows how to work with RPMs and init level services. Now they know how to work with Hadoop.
- Public YUM Repository – We’ll make sure it’s easy to stay up to date with the latest stable version of Hadoop.
- Simple Web Based Configuration Assistance – Do you know what the optimal setting for mapred.child.ulimit is? another.arcane.parameter? Well, we have some ideas, and are always learning more. To share that, we’ve created a configurator that asks a few important questions about your hardware and computes sensible values for all of your configuration parameters.
Our distribution is currently based on Hadoop 0.18.3, and all of our changes are Apache 2.0 licensed. We’ve patched in some stable features from later versions and included some patches we are still in the process of committing to Apache.
Check it out: http://blog.cloudera.com//hadoop