How-to: Manage Heterogeneous Hardware for Apache Hadoop using Cloudera Manager

In a prior blog post, Omar explained two important concepts introduced in Cloudera Manager 4.5: Role Groups and Host Templates. In this post, I’ll demonstrate how to use role groups and host templates to easily expand an existing CDH cluster onto heterogeneous hardware. If you haven’t already looked at Omar’s post, I’d recommend doing so before reading this one, as I’ll assume you are familiar with role groups and host templates.

Although these instructions/screenshots are premised on Cloudera Manager 4.5, they are valid for subsequent releases as well.

Initial State and Goal

This post describes enlarging a CDH4 cluster running HDFS and MapReduce from five nodes to 10. Initially, our cluster contains the hosts mikem-old-[1-5].ent.cloudera.com. Each host has a single physical drive storing HDFS data mounted at /data/1/. You can see the count of roles and services in the screenshot below:

The goal is to add five new hosts to the above cluster: mikem-new-[1-5].ent.cloudera.com. Moreover, each of these hosts should run a DataNode as well as a TaskTracker. The new hosts are identical to the old hosts except that they have two physical drives storing HDFS data mounted at /data/1/ and /data/2/. As such, the dfs.datanode.data.dir and mapred.local.dir parameters need to be set differently for DataNodes and TaskTrackers running on the new hosts. This difference will motivate the use of multiple Role Groups to enlarge the cluster.

Step 1: Create an Empty Role Group for New DataNodes

The DataNodes running on the old hosts have a single data directory at /data/1/dfs/dn. The DataNodes on the new hosts should have two data directories at /data/1/dfs/dn and /data/2/dfs/dn. To capture this difference, you can create a new DataNode role group for the DataNodes on the new hosts.

From the HDFS service page, navigate to the Role Groups membership page for HDFS by selecting Role Groups from the Configuration drop-down menu.

On the Role Groups membership page, click the Create new group… button to open up the Role Groups creation dialog. Call the new role group DataNode (2 Drives). Also, be sure to copy existing configuration values from the DataNode (Base) group. Before clicking Create, the dialog box will look something like this:

After creating the group, select View and Edit from the Configuration dropdown, and select DataNode (2 Drives) from the left navigation bar.

Set the DataNode Data Directory (dfs.datanode.data.dir) value to include both /data/1/dfs/dn and /data/2/dfs/dn.

Step 2: Set Up an Empty Role Group For New TaskTrackers

The TaskTrackers on the old hosts have a single MapReduce local directory at /data/1/mapred/local. The TaskTrackers on the new hosts should have two MapReduce local directories at /data/1/mapred/local and /data/2/mapred/local.

As was done for HDFS above, navigate to the MapReduce service page, select the Role Groups option from the Configuration drop down, and create a new TaskTracker role group called TaskTracker (2 Drives). Make sure to copy the configuration values from TaskTracker (Base).

Next, select the View and Edit option from the Configuration drop down, then select TaskTracker (2 Drives) from the left nav bar. Change the TaskTracker Local Data Directory List (mapred.local.dir) value to include both /data/1/mapred/local and /data/2/mapred/local.

Step 3: Create a Host Template Containing the New Role Groups

There are now two new role groups that reflect the configuration changes needed to support DataNode and TaskTracker roles on our new hosts. The next step is to use these role groups to create a host template to apply to the new hosts, mikem-new[1-5].ent.cloudera.com.

Navigate to the Host Template Management page by clicking the top level Hosts tab, then the lower level Templates tab. Open the dialog for creating a new Host Template.

For this example, call the new host template  Slave (2 Drives). Select the new role groups, DataNode (2 Drives) and TaskTracker (2 Drives).  After clicking the Create button, the Host Template screen should show the new host template.

Step 4: Run Add Hosts Wizard and Apply the Host Template

Now we are ready to actually add the new hosts to the cluster. From the Host Templates page, click the Status tab, and then the Add Hosts To Cluster button. Proceed through the Add Hosts Wizard as normal, adding hosts mikem-new-[1-5].ent.cloudera.com. After deploying packages or Parcels to the new hosts, we end up at a page that looks like this:

This page allows you to apply a previously existing host template to the new hosts (as well as create one if necessary). Choose the Slave (2 Drives) template that we just created and click Continue.

And that’s that! When the Add Hosts Wizard completes, it is done. Applying a host template automatically adds the necessary roles to the new hosts, then places those roles into the newly created role groups so that they are configured correctly. A look at the All Services page now shows that our cluster contains 10 DataNodes and 10 TaskTrackers.

Tips

  • If you forget to create a host template before running the Add Hosts Wizard, don’t worry! As long as you have created the requisite role groups, the wizard provides an option to create a new host template.
  • If you want to apply a host template to an existing host, you can do so by navigating to the Hosts page, selecting the hosts you want to apply a host template to, and executing “Apply Host Template” from the “Actions For Selected” menu. Applying a host template to existing host never deletes existing roles.  Instead, it adds new roles and moves existing roles to new groups if necessary.

Further Reading:

Mike Mellenthin is a Software Engineer on the Enterprise team.

> Have questions? Post them to the Community Forum for Cloudera Manager.

Filed under:

No Responses

Leave a comment


− seven = 2