Hadoop Administrator Training Gets Hands-On
- by David Goldsmith
- October 01, 2013
- no comments
I’ve always held a strong bias that education is most effective when the student learns by doing. As a developer of technical curricula, my goal is to have training participants engage with real and relevant problems as much as possible through hands-on exercises. The high rate at which Apache Hadoop is changing, both as a technology and as an ecosystem, makes developing Cloudera training courses not only demanding but also seriously fun and rewarding.
I recently undertook the challenge of upgrading the Cloudera Administrator Training for Apache Hadoop. I more than quadrupled the amount of hands-on exercises from the previous version, adding a full day to the course. At four days, it’s now the most thorough training for Hadoop administrators and truly the best way to start building expertise.
While developing the course, I collaborated with some of the most knowledgeable Hadoop administrators I could find, including Eric Sammer, Amandeep Khurana, Kathleen Ting, Romain Rigaux, and many other smart folks at Cloudera. The upgrades to the curriculum and exercises are based on best practices used to resolve our customers’ biggest problems. These insights resonate throughout the course, including the determination that administrators should learn installation, configuration, maintenance, monitoring, and troubleshooting using the standard Hadoop tools. Although we certainly hope that Hadoop users take advantage of Cloudera Manager to simplify and streamline many of these tasks, we believe that every good administrator needs to first take a look under the hood and tinker with Hadoop’s guts. There’s no replacement for get-your-hands-dirty experience to achieve expertise.
Cluster in the Cloud
In addition to making changes to the course materials and adding a bunch of exercises, I really got my geek on and built a new training environment. Students in the Cloudera Administrator Training class still start by working in a pseudo-distributed environment — a fully operational Hadoop cluster running on a single machine — but very quickly graduate to working with a full, four-instance cluster. Each student gets his or her own cluster to build up, configure, mess up, fix, mess up again, and explore.
Hands-on training is the best first step toward unlocking the opportunities Hadoop offers.
The new class environment is really cool. Students are given four Amazon EC2 instances to connect when they start doing their exercises. The EC2 instances are generated from public AMIs so that students can recreate the instances on their own AWS accounts and work their way through the exercises again after class is over. A local environment with four virtual machines is also available for situations where it is not possible to connect to the cloud from a classroom. The local environment operates identically to the cloud-based version.
Combined with a catch-up script — a powerful tool that lets students automatically set their clusters to the starting point for any exercise within the course — the environment encourages students to perform exercises on their own, at their desired pace, and as many times as they wish. The catch-up script is very helpful for students who get called out of class because of pressing issues at work or for students who simply felt rushed trying to get exercises working during class time. Every student has the option to go through the hands-on exercises again later and at his or her own pace to gain greater understanding through repeat experience and to focus on the exercises that are most relevant, regardless of where they fall in the stack.
The ability to perform hand-on exercises at one’s own pace, both during and after class, can really personalize the learning experience. My goal was to convey the sense that training does not end when the class ends, and that the course is “just in time and just for you.”
Hands On, Hands On, Hands On
I firmly believe hands-on training is the best first step toward achieving the efficiencies and unlocking the opportunities Hadoop offers. Over the course of four days, there are lots of highlights in the new Cloudera Administrator Training. Students get to:
- Install CDH on the cluster’s four instances
- Get Hive and Impala working and see if Impala queries can run as fast as Cloudera says they can
- Install Hue and configure it so that users in different roles have access to different Hadoop functionality in the Hue UI
- Configure Apache Hadoop for HDFS high availability
- Configure the Hadoop Fair Scheduler
As an added bonus, there’s a troubleshooting challenge at the conclusion of the course in which students test their newly acquired skills by diagnosing and then attempting to fix a messy, but altogether common, misconfiguration.
It’s rewarding being a curriculum developer at Cloudera because so many administrators from different types of companies rely on the curriculum to learn how to effectively store, manage, and access their Big Data. It’s fun, too, because Cloudera’s customers and engineers are advancing the Hadoop platform and ecosystem so rapidly that I have a whole new set of challenges in front of me every day. I’ll certainly be updating the Cloudera Administrator Training exercises and course content to work with CDH 4.4, which was recently released. And I’ve recently started taking a look at some exciting new technology, so we’ll see where that takes me!
If you want to learn more about the updated Cloudera Administrator Training course, watch this free on-demand webinar to get a sense of the prerequisites, intended audience, and outline of the live training. Ian Wrigley, Cloudera’s Senior Curriculum Manager and an extraordinary instructor as well, shares two sections of the course, including an overview of HDFS high availability and settings for some of Hadoop’s more advanced configuration options. I highly recommend this webinar for anyone considering moving into Hadoop administration.
David Goldsmith is Senior Curriculum Developer at Cloudera.