How-to: Use Cloudera Standard with StackIQ
StackIQ takes a “software defined infrastructure” approach to provision and manage cluster infrastructure that sits below Big Data platforms such as Apache Hadoop. In the guest post below, StackIQ co-founder and VP Engineering Greg Bruno explains how to install Cloudera Standard on top of StackIQ’s management system so they can work together.
The hardware used for this deployment is a small cluster: one node (i.e. one server) for the StackIQ Cluster Manager and four nodes as backend/data nodes. Each node has two disks and all nodes are connected via 1Gb Ethernet on a Private Network. The Cluster Manager node is also connected to a Public Network using its second NIC. (StackIQ Cluster Manager is used in similar deployments between two nodes and 4,000+ nodes in size.)
Step 1. Install StackIQ Cluster Manager
The StackIQ Cluster Manager node is installed from bare-metal (i.e. there is no prerequisite software and no operating system previously installed) by burning the StackIQ Cluster Core Roll ISO to DVD and booting from it. (Your can download the StackIQ Cluster Core Roll from the “Rolls” section after registering.) The Core Roll leads the user through a few simple forms (e.g., what is the IP address of the Cluster Manager, what is the gateway, DNS server) and then asks for a base OS DVD (for example, Red Hat Enterprise Linux 6.4; other Red Hat-like distributions such as CentOS are supported as well). The installer copies all the bits from both DVDs and automatically creates a new Red Hat distribution by blending the packages from both DVDs.
The remainder of the Cluster Manager installation requires no further manual actions and this entire step takes between 30 to 40 minutes.
Step 2. Install the CDH Bridge Roll
StackIQ has developed software that “bridges” its core infrastructure management solution to Cloudera Standard — which comprises CDH (Cloudera’s distribution of Hadoop and related projects) and Cloudera Manager — that we’ve named the CDH Bridge Roll. One feature of our management solution is that it records several parameters about each backend node (number of CPUs, networking configuration, disk partitions, etc.) in a local database. The CDH Bridge Roll is used to extract relevant parameters from the StackIQ infrastructure database and pass them to Cloudera Manager via StackIQ’s infrastructure database and the Cloudera Manager API.
After the Cluster Manager is installed and booted, it’s time to download and install the CDH Bridge Roll:
- On the frontend, download cdh-bridge iso from StackIQ Downloads
- Then execute the following commands:
# rocks add roll # rocks enable roll cdh-bridge # rocks create distro # rocks run roll cdh-bridge | sh
The frontend is now configured to interface with Cloudera Manager.
Step 3. Install the Backend Nodes
Before we install the backend nodes (also known as “compute nodes”), you need to ensure that all disks in the backend nodes are optimally configured for HDFS. During an installation of a data node, the software interacts with the disk controller to optimally configure it based on the node’s intended role. For data nodes, the disk controller will be configured in “JBOD mode” with each disk configured as a RAID 0, a single partition will be placed on each data disk, and a single filesystem will be created on that partition. For example, if a data node has one boot disk and four data disks, after the node installs and boots, you’ll see the following four filesystems on the data disks: /hadoop01, /hadoop02, /hadoop03 and /hadoop04. (For more information on this feature, see our blog post “Why Automation is the Secret Ingredient for Big Data”.)
You don’t want to reconfigure the controller and reformat disks on every installation, so you need to instruct the StackIQ Cluster Manager to perform this task the next time the backend nodes install. You do this by setting an attribute (“nukedisks”):
# rocks set appliance attr compute nukedisks true
Now you are ready to install the backend nodes. First, put the StackIQ Cluster Manager into “discovery” mode using the CLI or GUI and all backend nodes are PXE booted. The Cluster Manager discovers and installs each backend node in parallel (10 to 20 minutes) — no manual steps are required. For more information on installing and using the StackIQ Cluster Manager (a.k.a., Rocks+), please visit StackIQ Support or visit our YouTube channel. (The StackIQ Cluster Manager demo video is a good place to start.)
After all the nodes in the cluster are up and running, you will be ready to install Cloudera Manager. In this example, the StackIQ Cluster Manager node was named “frontend” and the compute nodes were assigned default names of compute-0-0, compute-0-1, compute-0-2 (3 nodes in Rack 0), and compute-1-0 (1 node in Rack 1).
Step 4. Install Cloudera Manager
To install Cloudera Manager on the front end, as root, execute:
Step 5. Create Local Yum repository for Cloudera Standard
The next two steps will take a bit of time. Start the first and grab a cup of coffee.
Create a CDH4 repository
# cd /var/www/html # reposync -r cloudera-cdh4 # cd cloudera-cdh4 # createrepo
Now create the Cloudera Manager repo, drink the coffee, and check email.
Create a Cloudera Manager 4 repository
# cd /var/www/html # reposync -r cloudera-manager # cd cloudera-manager # createrepo
Step 6. Select What to Iinstall
Log into the frontend http://:7180 (where ‘’ is the FQDN of your StackIQ Cluster Manager) with username ‘admin’ and password ‘admin’. Cloudera Manager will take you through a full CDH install. When you are done, you will have a working Hadoop instance on your cluster capable of running Hadoop-based analysis. You can see the full step-by-step install guide here.
Step 7. Run a Hadoop Sample Program
It is never enough to set-up a cluster and the applications users need and then let them have at it. There are generally nasty surprises all around when this happens. A validation check is a requirement to make sure everything is working as expected.
Do this to confirm the cluster is functional:
- Log into the frontend as root via SSH or Putty.
- On the command line, run the following MapReduce program as the “hdfs” user, which runs a simulation to estimate the value of pi based on sampling:
# sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar pi 10 10000
The output should look something like this:
A complete version of the steps involved to deploy real machines with StackIQ and Cloudera (with pictures!) is available from the StackIQ blog. We’re certain you’ll find this one of the quickest ways to deploy a cluster capable of running Cloudera’s platform. Give a it shot and send us your questions!