Debian packages for Apache Hadoop

Categories: Community Hadoop

When we announced Cloudera’s Distribution for Apache Hadoop last month, we asked the community to give us feedback on what features they liked best and what new development was most important to them. Almost immediately, Debian and Ubuntu packages for Hadoop emerged as the most popular request. A lot of customers prefer Debian derivatives over Red Hat, and installing RPMs on top of Debian, while possible with tools like alien, is a pain to say the least.

After some weeks of development and testing, we are happy to announce the Cloudera APT Repository. APT is the standard package distribution mechanism for Ubuntu and Debian, and by simply pointing your machines at our repository, you can have Hadoop installed within minutes.

Our Debian packages are comprised of the same components as our RPM based distribution, including:

  • Standard Linux service management – we package scripts in /etc/init.d for all of the major components of the Hadoop system
  • Native libraries on supported platforms – there are separate architecture-dependent packages for Hadoop Pipes, libhdfs, and native-code compression acceleration.
  • Extra Hadoop-based tools – along with core Hadoop, we have packages available for Pig and Hive

To get you started with Cloudera’s Distribution for Hadoop on Debian and Ubuntu, we’ve written up a short tutorial. Check it out, and remember to let us know what you think!


4 responses on “Debian packages for Apache Hadoop

  1. Todd Lipcon

    Hi Sumek,

    Yes, we will release specific jaunty versions in our apt repository in our next revision. For now, the intrepid packages should work fine – just add the apt repository as if you were running intrepid.

    In fact, the packages really shouldn’t differ at all between different Debian releases. We do the build and certification process separately each only to be absolutely sure that we won’t have unforeseen issues on any system.


  2. Scott McCrory

    I had a chance to play with it a couple hours last night on Ubuntu 9.0.4 (Jaunty) and it’s very nice. As Todd said you just need to call it “intrepid” and everything else goes smoothly. Great job with the packages guys, especially with services management and configuration support (standalone/pseudo/clustered). And having packages of Pig and Hive is a great add as well. Please keep it up – this is progressing Hadoop!