NYU, Analytics, and Cloudera’s QuickStart VM

The Cloudera QuickStart VM is an important platform for learning any Hadoop-related curriculum.

In the Fall 2013 semester, more than 30 NYU graduate students completed the Real-time and Big Data Analytics course at the NYU Courant Institute of Mathematical Sciences, for which I served as instructor.

In this introductory analytics course, students learn the architectures of the Apache Hadoop storage and compute systems (HDFS and MapReduce respectively). The early part of the course is dedicated to gaining experience with Hadoop and the Hadoop ecosystem projects — with this foundational knowledge in hand, students complete programming assignments in MapReduce, Apache Pig, Apache Hive, and more. This is all groundwork in preparation for the analytics project that each team is required to research, define, and develop.

With so much ground to cover, it is critical that the students invest their time in activities directly related to their final projects – not in setting up infrastructure. So, as a course requirement, I asked that students download and install the Cloudera QuickStart VM, which minimizes the time spent configuring a Hadoop environment and maximizes productive time spent on developing analytics – whether for graduate students enrolled in a course, or developers building proof-of-concepts. (This is the second cohort of students to take the course, and the first to use QuickStart VM as their Hadoop learning platform.)

Within minutes, students were exploring HDFS commands. This is in direct contrast to the first cohort’s experience, where each student performed the time-consuming chores of downloading, installing, and configuring Hadoop, then Pig, Hive, and so on. Often, this process required several iterations and many hours before a stable Hadoop environment was achieved. By using QuickStart VM, administrative chores were greatly reduced, enabling students to focus on analytics rather than software installation.

Cloudera’s QuickStart VM provides another advantage: it facilitates exploration of topics that extend beyond presented course material. With a full complement of Hadoop ecosystem projects already installed, configured, and available on their desktops, students were able to identify the best tools for the job at hand, experiment with them, and employ them in their analytics projects.

It was great to see students taking advantage of Cloudera Impala, Apache HBase, Apache Mahout, and other Hadoop ecosystem projects in their final solutions. Best of all, they did so without the distraction of locating, downloading, installing, and configuring  additional software – Cloudera’s QuickStart VM already had it all!

Suzanne McIntosh is a Solutions Consultant for Cloudera.

Editor’s Note: The Cloudera Academic Partnership (CAP) is another option for colleges and universities who would like to incorporate Big Data and Hadoop into their computer science or data analytics curricula. Cloudera offers free course materials, discounted classroom training, and certification for professors, and a complementary University License for more robust software features. Learn more about becoming a member school at http://university.cloudera.com/cap.

No Responses

Leave a comment


+ 9 = twelve