The Hadoop Community is an invariably fascinating world. After all, as Clouderan ATM put it in a past blog post, the user group meetups are adorably called “HUGs.” Just as the Cloudera blog has introduced you to some of the engineers, projects, and applications that serve as the head, heart, and hands of the Hadoop Community, we’re proud to add the circulatory system (to extend the metaphor), made up of Cloudera’s expert trainers and curriculum developers who bring Hadoop to new practitioners around the world every week.
Welcome to the first installment of our “Meet the Instructor” series, in which we briefly introduce you to some of the individuals endeavoring to teach Hadoop far and wide. Today, we speak to Jesse Anderson (@jessetanderson)!
What is your role at Cloudera?
I joined Cloudera about a year ago as a curriculum developer and instructor. I get the best of both worlds in educational services: I create and improve existing curriculum, such as the Cloudera Manager series, and I travel to teach the courses.
What do you enjoy most about training and/or curriculum development?
I enjoy passing on my knowledge about Hadoop and development. I spent the previous 15 years as an engineer and programmer, and now I want to share the benefits of my experience. As far as development goes, I’ve seen it all, from tiny startups to multinational corporations, working on projects ranging from distributed file systems to client/server applications, from enterprise services to the social/mobile web (where I was, unfortunately, required to wear skinny jeans).
I bring my accumulated knowledge and experience to the classroom to demonstrate effective development methods in the clearest, most comprehensive way I can. During class, I tell stories from my years in the developer trenches to illustrate how certain techniques are optimally or sub-optimally applied. Although I continue to code, I thankfully don’t get that call anymore at 2am, when there’s a bug that needs fixed. Instead, I get calls and emails from other Hadoop professionals around the world who are eager to discuss the state of the art. Now I focus on instructional code and even think of PowerPoint as my integrated development environment.
In a sense, my job is to learn as much as I can about Hadoop as an effective and dynamic data platform and pass that insight on to others. I’ve been thinking about things like that in my free time for years, but now I’m expected to do it full time. How cool is that?
Describe an interesting application you’ve seen or heard about for Apache Hadoop.
It’s interesting to see the wide array of production-worthy programs in the community. At Strata + Hadoop World 2012, some folks from LinkedIn presented about their Hadoop workflow, including a visualization of all its jobs and dependencies. It was a tangled, interconnected vision of enterprisey goodness! One of the artifacts from this workflow is LinkedIn’s People You May Know feature, which uses Graph Theory to find people with whom each member would like to connect due to some organic association, but who are not necessarily closely linked by degrees of separation. I use that application all the time, so it was really cool to see how Hadoop was enabling it.
I’m also partial to my own Million Monkeys Project. You may have heard of the Infinite Monkey Theorem or the saying “a million monkeys on a million typewriters will eventually recreate Shakespeare” (or maybe you’re a fan of The Simpsons). Using Hadoop, a program I wrote, some basic rules, and time, my virtual monkeys recreated the entirety of Shakespeare’s oeuvre. This project shows that Big Data isn’t always a massive file sitting in HDFS—sometimes it’s just a computationally intensive application that requires a cluster’s horsepower to finish.
What advice would you give to an engineer, system administrator, or analyst who wants to learn more about Big Data?
My two favorite quotes on the subject are:
“An investment in knowledge pays the best interest.” – Benjamin Franklin
“I have never let my schooling interfere with my education.” – Mark Twain
Get in the habit of investing in yourself with education and knowledge. Don’t let a college degree encumber continuous learning—a degree is only the foundational intelligence on which wisdom is built.
Those working in technical fields tend to already be in the habit of learning, so getting up to speed on Big Data is pretty easy. However, although Hadoop is an open-source project, training is by far the quickest way to move from the conceptual to the productive—Cloudera University courses are intensive and are meant to propel participants towards necessary expertise.
How did you become involved in technical training and Hadoop?
I’ve spent the past few years running the Reno-area local developers group, and Pragmatic Programmers recently published a series of my screencasts. These exercises were a way of proverbially dipping my toe in the training waters. I had been looking for a means to continue learning and coding, but untether myself from my workstation. I love to travel, teach, and meet end-users face-to-face, so training is an excellent way to get out on the road while keeping up my technical chops.
I became involved with Hadoop when I was doing the Million Monkeys project. I’ve spent a lot of time writing and working with distributed systems and did some scalability and ROI research with Hadoop. I like to tell people I have a visual representation in graph form for why I’m using Hadoop!
What’s one interesting fact or story about you that a training participant would be surprised to learn?
I used to moonlight as a karaoke D.J. Between amateur renditions of “Gin and Juice” and “Stayin’ Alive,” I liked to play my own painfully tempo- and pitch-shifted versions of “True” by Spandau Ballet as bumper music. It’s nice to have an ace up my sleeve in case my Hadoop classes get out of line!