In this installment of “Meet the Engineers”, meet Todd Lipcon (@tlipcon), PMC member/committer for the Hadoop, HBase, and Thrift projects.
What do you do at Cloudera, and in which Apache project are you involved?
I’m a software engineer in Cloudera’s Platform Engineering group, more specifically on the HDFS team. In my time at Cloudera I’ve also done a significant amount of work on other components of CDH including Hue, HBase, and MapReduce. I spend most of my time developing new features for these open source components – recently I’ve been primarily focused on designing and implementing High Availability for the HDFS NameNode. I’m also a performance nut, and have spent a lot of time over the last several years improving the performance of the Hadoop stack.
Why do you enjoy your job?
Getting paid to work on open source is pretty great. Almost all of my time goes to working with the open source community, whether it’s developing new code, reviewing patches from new contributors, or speaking at events like Hadoop user groups and conferences. From an engineering perspective, Hadoop is quite interesting and challenging to work on. It runs at enormous scale and in critical production workloads at some of the world’s biggest companies, so you really have to think through all the corner cases to make a robust design and implementation. (If you’re interested in learning more about the Cloudera work environment, I recently wrote a more lengthy post on this.)
Systems programming is particularly interesting to me since it encourages a “full-stack” perspective. To make Hadoop efficient, you have to really understand all the layers of the system, from the Linux kernel to the JVM to TCP/IP up through distributed consensus protocols. Jumping between these layers to solve bugs or improve performance keeps every day fresh and interesting.
What is your favorite thing about Hadoop?
From an engineer’s perspective, I like working on Hadoop because it’s very challenging. There are very few open source systems that operate at this kind of scale or are making this big of an impact. From a user’s perspective, I think Hadoop is exciting because it levels the playing field between technical powerhouses like Google who have built this kind of technology in-house and more traditional enterprises. I imagine that working on Hadoop today is very much like what it was like to work on Linux in the mid to late 90s.
What is your advice for someone who is interested in participating in any open source project for the first time?
Walk before you run. One mistake I’ve seen new contributors make is that they try to start off with a huge chunk of work at the core of the system. Instead, learn your way around the source code by doing small improvements, bug fixes, etc. Then, when you want to propose a larger change, the rest of the community will feel more comfortable accepting it. One great way to build karma in the community is to look at recently failing unit tests, file bugs, and fix them up.
At what age did you become interested and programming, and why?
I started out with Apple Basic and LOGO on an Apple IIc when I was 5 or 6 years old, probably because there weren’t that many exciting games to play on the machine, and drawing spirographs and making “guess-the-number” games was pretty cool. We even had some kind of adapter to hook up to our TV and display color! I progressed from there through various other beginner languages until Perl and C++ when I was 14 or so. By that point, I’d say I was interested because it was more challenging than working at a grocery store and paid a bit better too!
Look for our next “Meet the Engineer” profile in a week or two. See you then!