Getting hands-on with a multi-node cluster for self-learning or testing is even easier, now.
Last December, we introduced the Cloudera QuickStart Docker image to make it easier than ever before to explore Cloudera’s distributed data processing platform, including tools such as Apache Impala (incubating), Apache Spark, and Apache Solr. While the single-node getting-started image was well-received, we noted a large number of requests from the community for a multi-node CDH deployment via Docker.
Now there’s an even quicker “QuickStart” option for getting hands-on with the Apache Hadoop ecosystem and Cloudera’s platform: a new Docker image.
You might already be familiar with Cloudera’s popular QuickStart VM, a virtual image containing our distributed data processing platform. Originally intended as a demo environment, the QuickStart VM quickly evolved over time into quite a useful general-purpose environment for developers, customers,
Use the scripts and screenshots below to configure a Kerberized cluster in minutes.
Kerberos is the foundation of securing your Apache Hadoop cluster. With Kerberos enabled, user authentication is required. Once users are authenticated, you can use projects like Apache Sentry (incubating) for role-based access control via GRANT/REVOKE statements.
Taming the three-headed dog that guards the gates of Hades is challenging, so Cloudera has put significant effort into making this process easier in Hadoop-based enterprise data hubs.
(Editor’s note [Aug. 2, 2016]: A multi-cluster option for Docker-based deployment is now available for CDH 5.8 and later.)
Thanks to Christian Javet for his permission to republish his blog post below!
I wanted to get familiar with the big data world, and decided to test Hadoop. Initially, I used Cloudera’s pre-built virtual machine with its full Apache Hadoop suite pre-configured (called Cloudera QuickStart VM),
The Cloudera QuickStart VM is an important platform for learning any Hadoop-related curriculum.
In the Fall 2013 semester, more than 30 NYU graduate students completed the Real-time and Big Data Analytics course at the NYU Courant Institute of Mathematical Sciences, for which I served as instructor.
In this introductory analytics course, students learn the architectures of the Apache Hadoop storage and compute systems (HDFS and MapReduce respectively).