CDH2: Cloudera’s Distribution for Apache Hadoop 2

Categories: Community Hadoop Hive Pig

In March of this year, we released our distribution for Apache Hadoop.  Our initial focus was on stability and making Hadoop easy to install. This original distribution, now named CDH1, was based on the most stable version of Apache Hadoop at the time:0.18.3. We packaged up Apache Hadoop, Pig and Hive into RPMs and Debian packages to make managing Hadoop installations easier.  For the first time ever, Hadoop cluster managers were able to bring up a deployment by running one of the following commands depending on your Linux distribution:

As proof of this,

Read more

Hadoop World: NYC 2009: Speakers Announced

Categories: Community Hadoop

It’s been a crazy few weeks here at Cloudera, and while there is no sign of things letting up before Hadoop World: NYC 2009 on October 2nd, we wanted to take a minute to share the latest details about the speakers, and to say thanks to our sponsors who have recently come on board.

We’re absolutely thrilled to have such a wide variety of organizations sharing their experiences with Apache Hadoop.

Read more

Hadoop World: NYC 2009

Categories: Community General Hadoop Training

To say we were surprised by the quality and quantity of submissions we received for Hadoop World: NYC 2009 would be an understatement. We were amazed at how many “normal” companies have come to use Hadoop for everything ranging from business intelligence to protein alignment. It’s truly exciting to see how a system originally designed to process and index the web has evolved to support the data-driven workloads of so many industries.

Read more

Hadoop Default Ports Quick Reference

Categories: General Hadoop

Editor’s note (Oct. 3, 2013): The information below is now deprecated. We recommend that you consult this documentation for ports info instead.

Is it 50030 or 50300 for that JobTracker UI? I can never remember!

Hadoop’s daemons expose a handful of ports over TCP. Some of these ports are used by Hadoop’s daemons to communicate amongst themselves (to schedule jobs, replicate blocks, etc.). Others ports are listening directly to users,

Read more