How-to: Use Cloudera Enterprise with StackIQ
StackIQ takes a “software defined infrastructure” approach to provision and manage cluster infrastructure that sits below Big Data platforms such as Apache Hadoop. In the guest post below, StackIQ co-founder and VP Engineering Greg Bruno explains how to install Cloudera Enterprise on top of StackIQ’s management system so they can work together.
The hardware used for this deployment is a small cluster: one node (i.e. one server) for the StackIQ Cluster Manager and four nodes as backend/data nodes. Each node has two disks and all nodes are connected via 1Gb Ethernet on a Private Network. The Cluster Manager node is also connected to a Public Network using its second NIC. (StackIQ Cluster Manager is used in similar deployments between two nodes and 4,000+ nodes in size.)
Step 1. Install StackIQ Cluster Manager
The StackIQ Cluster Manager node is installed from bare-metal (i.e. there is no prerequisite software and no operating system previously installed) by burning the StackIQ Cluster Core Roll ISO to DVD and booting from it. (Your can download the StackIQ Cluster Core Roll from the “Rolls” section after registering.) The Core Roll leads the user through a few simple forms (e.g., what is the IP address of the Cluster Manager, what is the gateway, DNS server) and then asks for a base OS DVD (for example, Red Hat Enterprise Linux 6.4; other Red Hat-like distributions such as CentOS are supported as well). The installer copies all the bits from both DVDs and automatically creates a new Red Hat distribution by blending the packages from both DVDs.
The remainder of the Cluster Manager installation requires no further manual actions and this entire step takes between 30 to 40 minutes.
Step 2. Install the CDH Bridge Roll
StackIQ has developed software that “bridges” its core infrastructure management solution to Cloudera Enterprise — which comprises CDH and Cloudera Manager — that we’ve named the CDH Bridge Roll. One feature of our management solution is that it records several parameters about each backend node (number of CPUs, networking configuration, disk partitions, etc.) in a local database. After the Cluster Manager is installed and booted, it is time to download and install the CDH Bridge Roll:
After the Cluster Manager is installed and booted, it’s time to download and install the CDH Bridge Roll:
- Log into the frontend as "root", download cdh-bridge iso from StackIQ Downloads.
# wget http://stackiq-release.s3.amazonaws.com/stack2/cdh-bridge-4-1.x86_64.disk1.iso
- Then execute the following commands:
# rocks add roll <path_to_iso> # rocks enable roll cdh-bridge # rocks create distro # rocks run roll cdh-bridge | sh
The cluster is now configured to install Cloudera packages on all nodes.
Step 3. Install Cloudera Manager and Cloudera CDH4 Roll
# rocks add roll cloudera-cdh4/cloudera-cdh4-6.5-0.x86_64.disk1.iso # rocks add roll cloudera-manager/cloudera-manager-6.5-0.x86_64.disk1.iso # rocks enable roll cloudera-cdh4 # rocks enable roll cloudera-manager # rocks create distro # rocks run roll cloudera-cdh4 | sh # rocks run roll cloudera-manager | sh
Step 4. Install the Backend Nodes
Before we install the backend nodes (also known as compute nodes), we want to ensure that all disks in the backend nodes are optimally configured for HDFS. During an installation of a data node, our software interacts with the disk controller to optimally configure it based on the node’s intended role. For data nodes, the disk controller will be configured in “JBOD mode” with each disk configured as a RAID 0, a single partition will be placed on each data disk and a single file system will be created on that partition. For example, if a data node has one boot disk and 4 data disks, after the node installs and boots, you’ll see the following 4 file systems on the data disks: /hadoop01, /hadoop02, /hadoop03 and /hadoop04.
For more information on this feature, see our blog post Why Automation is the Secret Ingredient for Big Data Clusters.
Now we don’t want to reconfigure the controller and reformat disks on every installation, so we need to instruct the StackIQ Cluster Manager to perform this task the next time the backend nodes install. We do this by setting an attribute (“nukedisks”) with the rocks command line:
# rocks set appliance attr compute nukedisks true # rocks set appliance attr cdh-manager nukedisks true
Now we are ready to install the backend nodes. First we put the StackIQ Cluster Manager into "discovery" mode using the CLI or GUI and all backend nodes are PXE booted. We will boot the first node as a cdh-manager appliance. The cdh-manager node will run the Cloudera Manager web admin console used to configure, monitor and manage CDH.
We will install all the other nodes in the cluster as compute nodes. StackIQ Cluster Manager discovers and installs each backend node in parallel (10 to 20 minutes) – no manual steps are required. (For more information on installing and using the StackIQ Cluster Manager (a.k.a., Rocks+), please visit StackIQ Support or watch the demo video.)
After all the nodes in the cluster are up and running you will be ready to install Cloudera Manager. In this example, the StackIQ Cluster Manager node was named “frontend” and the compute nodes were assigned default names of compute-0-0, compute-0-1, compute-0-2 (3 nodes in Rack 0), and compute-1-0 (1 node in Rack 1).
Step 5. Install Cloudera Manager
To install Cloudera Manager on the frontend, as root, execute:
# /opt/rocks/sbin/cloudera-manager-installer.bin --skip_repo_package=1
This will install Cloudera Manager with packages from our local yum repository as opposed to fetching packages over the Internet.
Step 6. Select What to Install
Login to the frontend http://FQDN:7180 (where ‘’ is the FQDN of your StackIQ Cluster Manager) with username admi’ and password ‘admin’. Cloudera Manager will take you through a full CDH install. When you are done, you will have a working Hadoop instance on your cluster capable of running Hadoop-based analysis. You can see the full step-by-step install guide here.
Step 7. Run a Hadoop Sample Program
It is never enough to set-up a cluster and the applications users need and then let them have at it. There are generally nasty surprises all around when this happens. A validation check is a requirement to make sure everything is working as expected.
Do this to confirm the cluster is functional:
- Log into the frontend as root via SSH or Putty.
- On the command line, run the following MapReduce program as the “hdfs” user, which runs a simulation to estimate the value of pi based on sampling:
# sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar pi 10 10000
The output should look something like this:
A complete version of the steps involved to deploy real machines with StackIQ and Cloudera (with pictures!) is available from the StackIQ blog. We’re certain you’ll find this one of the quickest ways to deploy a cluster capable of running Cloudera’s platform. Give a it shot and send us your questions!