Category Archives: How-to

How-To: Run a MapReduce Job in CDH4

Categories: CDH How-to MapReduce Use Case

This is the first post in series that will get you going on how to write, compile, and run a simple MapReduce job on Apache Hadoop. The full code, along with tests, is available at http://github.com/cloudera/mapreduce-tutorial. The program will run on either MR1 or MR2.

We’ll assume that you have a running Hadoop installation, either locally or on a cluster, and your environment is set up correctly so that typing “hadoop” into your command line gives you some notes on usage. Detailed instructions for installing CDH,

Read more

How-to: Manage Permissions in Hue

Categories: How-to Hue MapReduce Oozie

Hue is a web interface for Apache Hadoop that makes common Hadoop tasks such as running MapReduce jobs, browsing HDFS, and creating Apache Oozie workflows, easier. (To learn more about the integration of Oozie and Hue, see this blog post.) In this post, we’re going to focus on how one of the fundamental components in Hue, Useradmin, has matured.

New User and Permission Features

User and permission management in Hue has changed drastically over the past year.

Read more

Analyzing Twitter Data with Apache Hadoop, Part 3: Querying Semi-structured Data with Apache Hive

Categories: CDH Hadoop Hive How-to Use Case

This is the third article in a series about analyzing Twitter data using some of the components of the Apache Hadoop ecosystem that are available in CDH (Cloudera’s open-source distribution of Apache Hadoop and related projects). If you’re looking for an introduction to the application and a high-level view, check out the first article in the series.

In the previous article in this series, we saw how Flume can be utilized to ingest data into Hadoop.

Read more

Analyzing Twitter Data with Apache Hadoop, Part 2: Gathering Data with Flume

Categories: CDH Flume Hadoop How-to Oozie Use Case

This is the second article in a series about analyzing Twitter data using some of the components of the Hadoop ecosystem available in CDH, Cloudera’s open-source distribution of Apache Hadoop and related projects. In the first article, you learned how to pull CDH components together into a single cohesive application, but to really appreciate the flexibility of each of these components, we need to dive deeper.

Every story has a beginning,

Read more

How-to: Set Up an Apache Hadoop/Apache HBase Cluster on EC2 in (About) an Hour

Categories: CDH Cloud Cloudera Manager How-to

Note (added July 8, 2013): The information below is deprecated; we suggest that you refer to this post for current instructions.

Today we bring you one user’s experience using Apache Whirr to spin up a CDH cluster in the cloud. This post was originally published here by George London (@rogueleaderr) based on his personal experiences; he has graciously allowed us to bring it to you here as well in a condensed form.

Read more