CDH2: Testing Release now with Pig, Hive, and HBase

Categories: General Hadoop HBase HDFS Hive MapReduce Pig Testing

At the beginning of September, we announced the first release of CDH2, our current testing repository. Packages in our testing repository are recommended for people who want more features and are willing to upgrade as bugs are worked out. Our testing packages pass unit and functional tests but will not have the same “soak time” as our stable packages. A testing release represents a work in progress that will eventually be promoted to stable.

Read more

Apache HBase Available in CDH2

Categories: Community General Hadoop HBase

One of the more common requests we receive from the community is to package Apache HBase with Cloudera’s Distribution for Apache Hadoop. Lately, I’ve been doing a lot of work on making Cloudera’s packages easy to use, and recently, the HBase team has pitched in to help us deliver compatible HBase packages. We’re pretty excited about this, and we’re looking forward to your feedback. A big thanks to Andrew Purtell, a Senior Architect at TrendMicro and HBase Contributor,

Read more

Grouping Related Trends with Hadoop and Hive

Categories: Community General Hadoop Hive

(guest blog post by Pete Skomoroch)

In a previous post, I outlined how to build a basic trend tracking site called with Cloudera’s Distribution for Hadoop and Hive.  TrendingTopics uses Hadoop to identify the top articles trending on Wikipedia and displays related news stories and charts.  The data powering the site was pulled from an Amazon EBS Wikipedia Public Dataset containing 8 months of hourly pageview logfiles. 

Read more

Apache Hadoop Log Files: Where to find them in CDH, and what info they contain

Categories: Hadoop

Apache Hadoop’s jobtracker, namenode, secondary namenode, datanode, and tasktracker all generate logs. That includes logs from each of the daemons under normal operation, as well as configuration logs, statistics, standard error, standard out, and internal diagnostic information. Many  users aren’t entirely sure what the differences are among these logs, how to analyze them, or even how to handle simple administrative tasks like log rotation.  This blog post describes each category of log, and then details where they can be found for each Hadoop component.

Read more

CDH2: Cloudera’s Distribution for Apache Hadoop 2

Categories: Community Hadoop Hive Pig

In March of this year, we released our distribution for Apache Hadoop.  Our initial focus was on stability and making Hadoop easy to install. This original distribution, now named CDH1, was based on the most stable version of Apache Hadoop at the time:0.18.3. We packaged up Apache Hadoop, Pig and Hive into RPMs and Debian packages to make managing Hadoop installations easier.  For the first time ever, Hadoop cluster managers were able to bring up a deployment by running one of the following commands depending on your Linux distribution:

As proof of this,

Read more