Docker is the New QuickStart Option for Apache Hadoop and Cloudera

Categories: CDH Ops and DevOps QuickStart VM Testing

Now there’s an even quicker “QuickStart” option for getting hands-on with the Apache Hadoop ecosystem and Cloudera’s platform: a new Docker image.

docker-logoYou might already be familiar with Cloudera’s popular QuickStart VM, a virtual image containing our distributed data processing platform. Originally intended as a demo environment, the QuickStart VM quickly evolved over time into quite a useful general-purpose environment for developers, customers, and partners. Today, the QuickStart VM is commonly utilized as:

  • A way to ramp-up on and self-learn new CDH features and components
  • An easy-to-deploy Hadoop training environment for newcomers
  • An appliance for continuous integration/API testing
  • A sandbox to prototype new ideas and applications
  • A platform for demonstrating your own software product

The QuickStart VM has long been available for a number of virtualization platforms: VMware, VirtualBox, and as a disk image usable by KVM and others. However, with the emergence of new container technology such as Docker, many maintainers of development and test environments have sought ways to simplify deployment through new and exciting alternatives to the traditional VM image.

Therefore today, we’re pleased to announce the availability of a Cloudera QuickStart Docker image! If you or your organization is using Docker, this image may provide the ideal lightweight, disposable environment for learning and exploring new technology, playing with new ideas, and for doing continuous integration before testing at scale. (However, Cloudera recommends using a more realistic test environment before moving to production.)

Docker is different from other platforms you may have used: it works with Linux containers. While “virtual machine” software typically simulates or isolates access to hardware so a guest operating system can run, a “container” is really just a partition of the host operating system. Each container has its own view of the filesystem and its own set of resources, but it’s really running on the same Linux kernel as the rest of the system. This approach is similar to that of BSD jails or Solaris zones.

Getting Started

Just like the QuickStart VM, this Docker image (currently a beta) includes all of CDH, and you can optionally add-on the free edition of Cloudera Manager or even a 60-day trial of Cloudera Enterprise. Have Docker map port 80 to your host and hit it with your browser, and you’ll also find an end-to-end tutorial with sample data included in the image.

If you have Docker installed, you can download the image and run a container right now:

You can find the image and full documentation on Docker Hub.

A World of Choices

With the availability of the QuickStart Docker image, you now have three different options for exploring Apache Hadoop and Cloudera’s platform to suit your needs: conventional desktop VM, Docker image, or AWS-based demo cluster (Cloudera Live). The choice is yours!

If you run into any problems or have improvement ideas, please share with us on our Community Forum. As always, we’re excited to hear about how you use it!

Sean Mackrory is a Software Engineer at Cloudera, and a member of the Apache Bigtop PMC.

facebooktwittergoogle_pluslinkedinmailfacebooktwittergoogle_pluslinkedinmail

20 responses on “Docker is the New QuickStart Option for Apache Hadoop and Cloudera

  1. Erik

    Very nice!
    Only problem with this is that HBase won’t start. I do not have time to investigate further, but manually starting HBase after boot works fine, e.g.:
    hbase master start &
    hbase regionserver start &

  2. Sudhir Nallagangu

    I have 2 questions
    1) I did not take the /docker-quikcstart route which seems to start all services required. Instead I opted for /home/cloudera/cloudera-manager which seems to kill all “hadoop” services and just start cloudera-scm-server and cloudera-scm-agent. The attempt to access http://quickstart.cloudera:7180/ failed. on windows. So I added a route add 172.17.0.0/16 192.168.99.100 (docker ps on container showed its port as 172.17.0.2. So port forwarding seems to be legit. On further debugging, I logged to Oracle Vbox as part of docker setup and tried t “curl 172.17.0.2:7180 which failed. I connected to container itself and found cloudera-scm-agent (pid 1368) is running…
    cloudera-scm-server dead but pid file exists. What are all the deamons/services “cloudera manager” needs? Where would find logs to cloudera-scm-server?

    2) When can we expect a multi-node cluster docker image akin to what hortonworks have from sequenceiq/cloudbreak ?

    Sudhir

    1. Justin Kestelyn Post author

      Sudhir, answering in order:

      1. See the comments thread here; I think it will address your issue. Also, for future issues, post questions here.

      2. I think Cloudera Director (free) is what you’re looking for; been around for about a year.

  3. Caio Quirino da Silva

    Hi! I am the maintainer of the caioquirino/docker-cloudera-quickstart image.
    I’m glad you have launched the official quickstart image. So i have a considerable work to made it and i am really interested to see the image’ Dockerfile and git repository. I think i can join my work with yours and make a great job for this image. Please open the build repository for everyone :D

  4. Luis Silveira

    Hi,
    instead of running a single container with all hadoop stack, is it possible to isolate and put all data nodes and name nodes which one in its own container, so a lot of containers running on the host system?
    Luis

  5. Snehil Jain

    Hi.. I’m taking a course on big data analytics and have downloaded the virtual box tool as well as the quickstart VM. The one major problem with it is that it requires me to have an 8GB RAM machine. Can the docker run on a 4GB RAM machine? Also, does it work on Windows?

  6. Jorge

    Once the container is installed and running, is it possible to use it in order to follow the Cloudera Live tutorial? If that is true, how do I start it?
    Thanks!

    1. Jorge

      Well, I got it, using the above phrase “Have Docker map port 80 to your host and hit it with your browser, and you’ll also find an end-to-end tutorial with sample data included in the image”, and doing a bit of search in Google about the SimpleHTTPServer that I saw that the ‘docker run’ command is using:

      1. Launch the container with the ‘-p 80’ option: sudo docker run –privileged=true –hostname=quickstart.cloudera -t -i -p 80 YOUR_HASH /usr/bin/docker-quickstart

      2. When the previous command finishes, it enables a Bash Shell. From this Bash Shell execute the following command in order to see which is the docker image IP: hostname -I

      3. Put this IP in the browser and working!

  7. Abhishek

    The image and container got installed and are running correctly. but for the life of me I cannot acces the HUE UI from the HOST machine.

    I tried this

    docker run –privileged=true –hostname=quickstart.cloudera -p 7180 -p 8888 -t -i 9f3ab06c7554 /usr/bin/docker-quickstart

    but still when I do http://localhost:8888 or http://localhost:7180 the browser fails to connect.

  8. sai

    I am trying to create some test data from the base container and not sure how can I inject my test data into the container. I tried creating an image off of the base container.
    FROM cloudera/quickstart:latest
    #Load Sample data in HDFS
    COPY mydata.txt /
    USER root
    #RUN Container and perform hdfs commands to populate the data
    RUN hostname=quickstart.cloudera /usr/bin/docker-quickstart && \
    hdfs dfs -mkdir /user/mydata && \
    hdfs dfs -put mydata,txt /user/mydata/
    CMD [“/usr/bin/docker-quickstart”]

    Also, let me know if there is any other way that I can inject my test data. I want to use the container with some test data.

  9. Tudor-Lucian Lapusan

    Hi,
    I can’t exit from the container.
    After I execute the command “docker run –privileged=true –hostname=quickstart.cloudera -t -i ${HASH} /usr/bin/docker-quickstart”, the CDH services are starting correctly, BUT when I want to get out from container and press “exit”, the shell is blocked (waiting more than 15 minutes and nothing happened”.
    Do you have any ideas ?
    Thanks

  10. Wirawan Purwanto

    This docker-based image is a nice option for CDH quickstart. However I have one concern regarding the security of it: why does it have to be run as a fully privileged container? Can’t we limit it to only the privileges this container requires? By adding only those needed capabilities to the container.

    Wirawan

  11. Sam

    Hi,
    I have docker installed, up and running. How can I run the word count(mpReduce) example or run my own hadoop jar. I am getting a command prompt like [root@quickstart /]#
    I tried some hadoop commands but not getting anything, some times it says command not found or simply no results.
    I would really appreciate if you could let me know how can I run mapReduce jobs in docker.

    Thanks,
    Sam

    1. Srik

      Hi,

      I am also facing the same problem regarding how to run a MapReduce Program in Cloudera Docker.
      I am using a Linux (ElementaryOS) high config. laptop 24GB RAM, i7 Processor.
      I am able to install Cloudera docker image, ran it and also did the following without issues:
      1. Seeing # prompt and run all HDFS commands. In fact uploaded a word count text file into HDFS using commands.
      2. Able to access Hue Editor
      3. Able to run Cloudera manager and start all services (Everything).
      4.In my Local Environment, I am able to create a WordCount MapReduce program (jar), downloaded all Maven dependencies for this program (not inside docker container).
      Now my question is:
      How to submit this WordCount JAR to running Docker Container?
      How to run this MapReduce program/job (WordCount) with uploaded text file (HDFS)?