Tag Archives: CDH

More on Cloudera’s Distribution including Apache Hadoop 3

Categories: General

A week ago we announced two significant product updates: a substantial functional update (doubling the number of components) to Cloudera’s Distribution including Apache Hadoop (CDH) and the launch of Cloudera Enterprise.  I wanted to delve a bit deeper into the first announcement regarding Cloudera’s Distribution including Apache Hadoop version 3 (CDH3).  This post will actually serve to kick off a series of posts that go into progressively more detail about different aspects of CDH3.

Cloudera has been in the Hadoop business for nearly two years now, Read more

Apache Hadoop Log Files: Where to find them in CDH, and what info they contain

Categories: Hadoop

Apache Hadoop’s jobtracker, namenode, secondary namenode, datanode, and tasktracker all generate logs. That includes logs from each of the daemons under normal operation, as well as configuration logs, statistics, standard error, standard out, and internal diagnostic information. Many  users aren’t entirely sure what the differences are among these logs, how to analyze them, or even how to handle simple administrative tasks like log rotation.  This blog post describes each category of log, and then details where they can be found for each Hadoop component.

Read more

CDH2: Cloudera’s Distribution for Apache Hadoop 2

Categories: Community Hadoop Hive Pig

In March of this year, we released our distribution for Apache Hadoop.  Our initial focus was on stability and making Hadoop easy to install. This original distribution, now named CDH1, was based on the most stable version of Apache Hadoop at the time:0.18.3. We packaged up Apache Hadoop, Pig and Hive into RPMs and Debian packages to make managing Hadoop installations easier.  For the first time ever, Hadoop cluster managers were able to bring up a deployment by running one of the following commands depending on your Linux distribution:

As proof of this,

Read more

Database Access with Apache Hadoop

Categories: General Hadoop MapReduce

Editor’s note (added Nov. 9. 2013): Valuable data in an organization is often stored in relational database systems. To access that data, you could use external APIs as detailed in this blog post below, or you could use Apache Sqoop, an open source tool (packaged inside CDH) that allows users to import data from a relational database into Apache Hadoop for further processing. Sqoop can also export those results back to the database for consumption by other clients.

Read more