How-to: Install a Virtual Apache Hadoop Cluster with Vagrant and Cloudera Manager

It’s been a while since we provided a how-to for this purpose. Thanks, Daan Debie (@DaanDebie), for allowing us to re-publish the instructions below (for CDH 5)!

I recently started as a Big Data Engineer at The New Motion. While researching our best options for running an Apache Hadoop cluster, I wanted to try out some of the features available in the newest version of Cloudera’s Hadoop distribution: CDH 5. Of course I could’ve downloaded the QuickStart VM, but I rather wanted to run a virtual cluster, making use of the 16GB of RAM my shiny new 15″ Retina Macbook Pro has ;)


There are some tutorials, and repositories available for installing a local virtualized cluster, but none of them did what I wanted to do: install the bare cluster using Vagrant, and install the Hadoop stack using the Cloudera Manager. So I created a simple Vagrant setup myself. You can find it here.

Setting up the Virtual Machines

As per the instructions from the Gitub repo:

Depending on the hardware of your computer, installation will probably take between 15 and 25 minutes.

First install VirtualBox and Vagrant.

Install the Vagrant Hostmanager plugin.


Clone this repository.


Provision the bare cluster. It will ask you to enter your password, so it can modify your /etc/hosts file for easy access in your browser. It uses the


Now we can install the Hadoop stack.

Installing Hadoop and Related Components

  1. Surf to: http://vm-cluster-node1:7180.
  2. Login with admin/admin.
  3. Select Cloudera Express and click Continue twice.
  4. On the page where you have to specifiy hosts, enter the following: vm-cluster-node[1-4] and click Search. Four nodes should pop up and be selected. Click Continue.
  5. On the next page (“Cluster Installation > Select Repository”), leave everything as is and click Continue.
  6. On the next page (“Cluster Installation > Configure Java Encryption”) I’d advise to tick the box, but only if your country allows it. Click Continue.
  7. On this page do the following:
    • Login To All Hosts As: Another user -> enter vagrant
    • In the two password fields enter: vagrant
    • Click Continue.
  8. Wait for Cloudera Manager to install the prerequisites… and click Continue.
  9. Wait for Cloudera Manager to download and distribute the CDH packages… and click Continue.
  10. Wait while the installer is inspecting the hosts, and Run Again if you encounter any (serious) errors (I got some that went away the second time). After this, click Finish.
  11. For now, we’ll install everything but HBase. You can add HBase later, but it’s quite taxing for the virtual cluster. So on the “Cluster Setup” page, choose “Custom Services” and select the following: HDFS, Hive, Hue, Impala, Oozie, Solr, Spark, Sqoop2, YARN and ZooKeeper. Click Continue.
  12. On the next page, you can select what services end up on what nodes. Usually Cloudera Manager chooses the best configuration here, but you can change it if you want. For now, click Continue.
  13. On the “Database Setup” page, leave it on “Use Embedded Database.” Click Test Connection (it says it will skip this step) and click Continue.
  14. Click Continue on the “Review Changes” step. Cloudera Manager will now try to configure and start all services.

And you’re Done!. Have fun experimenting with Hadoop!

7 Responses
  • Daan Debie / June 20, 2014 / 4:05 PM

    Originally published here :) :

  • Marek Obuchowicz / July 18, 2014 / 3:07 AM

    Awsome :) I really like how quick I can setup a sandbox VM using this. Keep goin’ on, Cloudera.

  • Buntu / July 21, 2014 / 10:59 AM

    Thanks a ton for saving everyone the time!!

    I’ve installed the VM with CDH 5.0.3 and under Parcels I do not see CDH 5.1.0. How do I upgrade to CDH 5.1.0 released last week?

  • Buntu / July 25, 2014 / 11:34 AM

    Thanks Justin, actually after restarting vagrant I was able to see the cdh 5.1.0 parcel. But good to know the alternative.

  • Raj / March 18, 2015 / 9:43 AM

    Can you post the steps for Windows?

  • Raj / March 18, 2015 / 1:28 PM

    ok…my Windows laptop has only 8GB so I chose only 2 machines in step 4. Now my zookeeper is failing in step 14 -

    Initializing ZooKeeper Service

    Completed 1 steps successfully.

    Starting ZooKeeper Service

    Failed to execute command Start on service ZooKeeper

Leave a comment

7 × one =