Today’s interview features Todd Lipcon, software engineer for Cloudera. Todd will be presenting Optimizing MapReduce Job Performance at Hadoop Summit.
Question: Tell us about your current role and how you interact with Apache Hadoop?
Todd: I’m a software engineer on Cloudera’s platform engineering team, where I spend most of my time contributing code to open source projects like Apache Hadoop and Apache HBase. Most recently I’ve been implementing the automatic HA failover feature in Hadoop 2.0, but I’ve also spent a lot of time working on understanding and improving performance of the Hadoop stack.
Question: Tell us about your Hadoop Summit presentation?
Todd: At this year’s summit, I will be presenting about the internals of MapReduce and how you can tune your MapReduce jobs for optimal performance. A lot of developers see MapReduce as a black box, but looking inside that box can help you understand where you might have bottlenecks or easy opportunities to improve performance by changing a few configuration parameters.
Question: What do you expect will be the key takeaway for folks attending your session?
Todd: I hope attendees will walk away with a better understanding of each of the phases of MapReduce task execution, and a few key configuration parameters they can play with to get better performance without changing their code.
Question: What other presentations are you most looking forward to attending?
Todd: I’m really looking forward to Josh Wills’ talk on BranchReduce: Distributed Branch-and-Bound on YARN. There are a lot of optimization problems which can be solved by branch-and-bound approaches, and it’s only recently with the introduction of YARN that these types of algorithms can be efficiently built on Hadoop. Not only this a fresh topic, Josh is also an entertaining speaker!