Apache HBase Available in CDH2

Categories: Community General Hadoop HBase

One of the more common requests we receive from the community is to package Apache HBase with Cloudera’s Distribution for Apache Hadoop. Lately, I’ve been doing a lot of work on making Cloudera’s packages easy to use, and recently, the HBase team has pitched in to help us deliver compatible HBase packages. We’re pretty excited about this, and we’re looking forward to your feedback. A big thanks to Andrew Purtell, a Senior Architect at TrendMicro and HBase Contributor,

Read more

Grouping Related Trends with Hadoop and Hive

Categories: Community General Hadoop Hive

(guest blog post by Pete Skomoroch)

In a previous post, I outlined how to build a basic trend tracking site called trendingtopics.org with Cloudera’s Distribution for Hadoop and Hive.  TrendingTopics uses Hadoop to identify the top articles trending on Wikipedia and displays related news stories and charts.  The data powering the site was pulled from an Amazon EBS Wikipedia Public Dataset containing 8 months of hourly pageview logfiles. 

Read more

Apache Hadoop Log Files: Where to find them in CDH, and what info they contain

Categories: Hadoop

Apache Hadoop’s jobtracker, namenode, secondary namenode, datanode, and tasktracker all generate logs. That includes logs from each of the daemons under normal operation, as well as configuration logs, statistics, standard error, standard out, and internal diagnostic information. Many  users aren’t entirely sure what the differences are among these logs, how to analyze them, or even how to handle simple administrative tasks like log rotation.  This blog post describes each category of log, and then details where they can be found for each Hadoop component.

Read more

CDH2: Cloudera’s Distribution for Apache Hadoop 2

Categories: Community Hadoop Hive Pig

In March of this year, we released our distribution for Apache Hadoop.  Our initial focus was on stability and making Hadoop easy to install. This original distribution, now named CDH1, was based on the most stable version of Apache Hadoop at the time:0.18.3. We packaged up Apache Hadoop, Pig and Hive into RPMs and Debian packages to make managing Hadoop installations easier.  For the first time ever, Hadoop cluster managers were able to bring up a deployment by running one of the following commands depending on your Linux distribution:

As proof of this,

Read more

Hadoop World: NYC 2009: Speakers Announced

Categories: Community Hadoop

It’s been a crazy few weeks here at Cloudera, and while there is no sign of things letting up before Hadoop World: NYC 2009 on October 2nd, we wanted to take a minute to share the latest details about the speakers, and to say thanks to our sponsors who have recently come on board.

We’re absolutely thrilled to have such a wide variety of organizations sharing their experiences with Apache Hadoop.

Read more