How-to: Install a Virtual Apache Hadoop Cluster with Vagrant and Cloudera Manager
It’s been a while since we provided a how-to for this purpose. Thanks, Daan Debie (@DaanDebie), for allowing us to re-publish the instructions below (for CDH 5)!
I recently started as a Big Data Engineer at The New Motion. While researching our best options for running an Apache Hadoop cluster, I wanted to try out some of the features available in the newest version of Cloudera’s Hadoop distribution: CDH 5. Of course I could’ve downloaded the QuickStart VM, but I rather wanted to run a virtual cluster, making use of the 16GB of RAM my shiny new 15″ Retina Macbook Pro has ;)
There are some tutorials, and repositories available for installing a local virtualized cluster, but none of them did what I wanted to do: install the bare cluster using Vagrant, and install the Hadoop stack using the Cloudera Manager. So I created a simple Vagrant setup myself. You can find it here.
Setting up the Virtual Machines
As per the instructions from the Gitub repo:
Depending on the hardware of your computer, installation will probably take between 15 and 25 minutes.
Install the Vagrant Hostmanager plugin.
$ vagrant plugin install vagrant-hostmanager
Clone this repository.
$ git clone https://github.com/DandyDev/virtual-hadoop-cluster.git
Provision the bare cluster. It will ask you to enter your password, so it can modify your
/etc/hosts file for easy access in your browser. It uses the
$ cd virtual-hadoop-cluster
$ vagrant up
Now we can install the Hadoop stack.
Installing Hadoop and Related Components
- Surf to: http://vm-cluster-node1:7180.
- Login with
- Select Cloudera Express and click Continue twice.
- On the page where you have to specifiy hosts, enter the following:
vm-cluster-node[1-4]and click Search. Four nodes should pop up and be selected. Click Continue.
- On the next page (“Cluster Installation > Select Repository”), leave everything as is and click Continue.
- On the next page (“Cluster Installation > Configure Java Encryption”) I’d advise to tick the box, but only if your country allows it. Click Continue.
- On this page do the following:
- Login To All Hosts As: Another user -> enter
- In the two password fields enter:
- Click Continue.
And you’re Done!. Have fun experimenting with Hadoop!