You may have seen the recent announcement from Skytap about the availability of pre-configured CDH4 templates in the Skytap Cloud public template library. So for anyone who wants to try out a Cloudera Hadoop cluster—from small to large—it can now be easily accomplished in Skytap Cloud. The how-to below from Skytap’s Matt Sousely explains how.
The goal of this how-to will be to spin up a 10-node Cloudera Hadoop cluster in Skytap Cloud. To begin, let’s talk about the two new Cloudera Hadoop cluster templates. The first is Cloudera CDH4 Hadoop cluster: a 2-node Hadoop cluster template. It includes 2 nodes and a management node/server. The second is the Cloudera CDH4 Hadoop host template. This second template is not intended to run by itself in a configuration—rather, it contains a host VM that is ready to become another Hadoop node in the Cloudera CDH4 Hadoop cluster template-based configuration.
To start, let’s spin up a Cloudera Hadoop cluster.
- Log in to Skytap Cloud
- Choose the Templates tab
- In the search box, type hadoop
- Select Cloudera CDH4 Hadoop cluster
- Click New Configuration
- Click Run
Once all the VMs start (in about 90 seconds), you’ll have a working Cloudera Hadoop cluster with all the normal services (hdfs, hbase, hue, mapreduce, oozie, and zookeeper). This cluster is a 2-node cluster (host 1 and host 2) with a management server (manager). While a 2-node cluster is enough to get going with Cloudera Hadoop, it’s possible in Skytap Cloud to ratchet this up to a cluster of any size. For this blog post, we’ll expand this into a 10-node cluster.
The Cloudera Manager is hosted on the manager VM on port 7180. However, none of the VMs in this configuration have a web browser, so we need a way to interact with Cloudera Manager. This can be accomplished in a few different ways: 1) Use a Skytap published service, 2) Use ICNR (inter-configuration network routing) with a configuration that has a graphical web browser, 3) Use a public IP, or 4) Use Skytap VPN to connect your local network to this configuration. For production use, VPN is probably your best bet, but for this blog post I’m going to use a published service. To add the published service, do the following:
- Click Settings.
- Click VM Settings.
- Select manager in the Select a VM menu.
- Under Network Adapters choose Add Published Service.
- In the dropdown, select By Port:
- Enter 7180 in the text box.
- Click Add Published Service.
- Expand the Show Published Services link and note the url and port number. Example – services.cloud.skytap.com:25693
Now you can put that URL into your local web browser and get the Cloudera Manager (Free Edition) login page. You should then be able to use the username of ‘admin’ and the password found in the credentials tab of the manager VM settings for the ‘admin’ account.
Now that everything is running, the Cloudera Manager is accessible, and I’m logged in, it’s time to expand our cluster from 2 nodes to 10 nodes. To do that:
- Click Back to configuration to get back to the 2-node configuration.
- Click Add VMs.
- In the search box, type hadoop.
- Select Cloudera CDH4 Hadoop host.
- Click Add.
- Redo steps 2-5 another 7 times (to take our host count up to 10).
- Notice that although the titles for all of these new nodes are shown as ‘host-n’ their network names have been automatically incremented.
- Optionally, to make the configuration easier to view, I can rename all node hosts from host-n to their corresponding host-x number.
After about 90 seconds, everything will start up and we’ll have all the hosts we need for our 10-node cluster. It’s now time to go back to Cloudera Manager to finish setting up the nodes.
- Go back into Cloudera Manager. (Note: You may need to log in again.)
- Click Hosts at the top of the web page.
- Click Add Hosts.
- Click Continue.
- In the search form, type host-[3-10].hadoop.local
- This will search DNS to ping and find all the new host nodes.
- Wait for all of the nodes to finish installing. (Note: It could take 10-15 minutes for everything to install.)
- If for any reason the web page times out, or something just doesn’t seem right, you can redo steps 2-9 to validate that all the software was installed properly.
- All hosts should resolve as green. (Note: It is OK if you have one yellow relating to mismatched versions.)
- If not, run steps 2-11 again.
- It should forward you to the hosts page where all your hosts (1 through 10 and manager) should show up in good health.
At this point, we have a 10-node Cloudera Hadoop cluster, but we want to put these new nodes to work just like nodes 1 and 2. So, to accomplish that:
- Click Cloudera Manager (Free Edition) at the top left of the web UI. This will bring you back to the services page.
- Click the upside-down triangle next to each server, then click Instances.
- Click Add.
- In the Add Role Instances view, check the same boxes for hosts 3-10 that are checked for hosts 1 and 2.
- In the case of HDFS, this would be the ‘region server’ column.
- Wait for the commands to complete.
- Note: Some services may not utilize nodes 1 and 2, in which case you can safely leave out nodes 3-10 as well. For example, the Hue service is only hosted on the manager VM and there are no settings for nodes 1 and 2. If you would like to make manager fault tolerant, you will want to follow all the steps in this blog post to create a second manager node and that is identical to the existing manager node.
And there you have it—a 10-node Cloudera Hadoop cluster.
Matt is the Manager/Developer of Public Templates at Skytap. He started building content for NetIQ on its Operations Manager product (later bought by Microsoft and renamed MOM) as well as its AppManager product. He has also worn many hats while working for iConclude/Opsware/HP and FullArmor.