How-to: Install a Virtual Apache Hadoop Cluster with Vagrant and Cloudera Manager

It’s been a while since we provided a how-to for this purpose. Thanks, Daan Debie (@DaanDebie), for allowing us to re-publish the instructions below (for CDH 5)!

I recently started as a Big Data Engineer at The New Motion. While researching our best options for running an Apache Hadoop cluster, I wanted to try out some of the features available in the newest version of Cloudera’s Hadoop distribution: CDH 5. Of course I could’ve downloaded the QuickStart VM, but I rather wanted to run a virtual cluster, making use of the 16GB of RAM my shiny new 15″ Retina Macbook Pro has ;)

Vagrant

There are some tutorials, and repositories available for installing a local virtualized cluster, but none of them did what I wanted to do: install the bare cluster using Vagrant, and install the Hadoop stack using the Cloudera Manager. So I created a simple Vagrant setup myself. You can find it here.

Setting up the Virtual Machines

As per the instructions from the Gitub repo:

Depending on the hardware of your computer, installation will probably take between 15 and 25 minutes.

First install VirtualBox and Vagrant.

Install the Vagrant Hostmanager plugin.

$ vagrant plugin install vagrant-hostmanager

 

Clone this repository.

$ git clone https://github.com/DandyDev/virtual-hadoop-cluster.git

 

Provision the bare cluster. It will ask you to enter your password, so it can modify your /etc/hosts file for easy access in your browser. It uses the

$ cd virtual-hadoop-cluster
$ vagrant up

 

Now we can install the Hadoop stack.

Installing Hadoop and Related Components

  1. Surf to: http://vm-cluster-node1:7180.
  2. Login with admin/admin.
  3. Select Cloudera Express and click Continue twice.
  4. On the page where you have to specifiy hosts, enter the following: vm-cluster-node[1-4] and click Search. Four nodes should pop up and be selected. Click Continue.
  5. On the next page (“Cluster Installation > Select Repository”), leave everything as is and click Continue.
  6. On the next page (“Cluster Installation > Configure Java Encryption”) I’d advise to tick the box, but only if your country allows it. Click Continue.
  7. On this page do the following:
    • Login To All Hosts As: Another user -> enter vagrant
    • In the two password fields enter: vagrant
    • Click Continue.
  8. Wait for Cloudera Manager to install the prerequisites… and click Continue.
  9. Wait for Cloudera Manager to download and distribute the CDH packages… and click Continue.
  10. Wait while the installer is inspecting the hosts, and Run Again if you encounter any (serious) errors (I got some that went away the second time). After this, click Finish.
  11. For now, we’ll install everything but HBase. You can add HBase later, but it’s quite taxing for the virtual cluster. So on the “Cluster Setup” page, choose “Custom Services” and select the following: HDFS, Hive, Hue, Impala, Oozie, Solr, Spark, Sqoop2, YARN and ZooKeeper. Click Continue.
  12. On the next page, you can select what services end up on what nodes. Usually Cloudera Manager chooses the best configuration here, but you can change it if you want. For now, click Continue.
  13. On the “Database Setup” page, leave it on “Use Embedded Database.” Click Test Connection (it says it will skip this step) and click Continue.
  14. Click Continue on the “Review Changes” step. Cloudera Manager will now try to configure and start all services.

And you’re Done!. Have fun experimenting with Hadoop!

5 Responses

Leave a comment


× six = 18