An important part of making sure Apache Hadoop works well for all users is developing and maintaining strong relationships with the folks who run Hadoop day in and day out. Edward Capriolo keeps About.com’s Hadoop cluster happy, and we frequently chew the fat with Ed on issues ranging from administrative best practices to monitoring. Ed’s been an invaluable resource as we beta test our distribution and chase down bugs before our official releases. Today’s article looks at some of Ed’s tricks for monitoring Hadoop with Cacti through JMX.
The distributed nature of MapReduce programs makes debugging a challenge. Attaching a debugger to a remote process is cumbersome, and the lack of a single console makes it difficult to inspect what is occurring when several distributed copies of a mapper or reducer are running concurrently. Furthermore, operations that work on small amounts of input (e.g., saving the inputs to a reducer in an array) fail when running at scale, causing out-of-memory exceptions or other unintended effects.
Apache Hadoop moves fast. Users often find that they need to upgrade after just a few months. Upgrading can be a daunting task, especially if you are several versions behind. We’ve been working with Rackspace for a while now, and they recently embarked on an upgrade from Hadoop 0.15.3 to Cloudera’s Distribution for Hadoop based on 0.18.3. Stu Hood, Search Team Technical Lead at Rackspace, was kind enough to document their experience, and we’re happy to share it with you here. Read more
Yesterday, Chris Goffinet from Digg made a great blog post about LZO and Hadoop. Many users have been frustrated because LZO has been removed from Apache Hadoop’s core, and Chris highlights a great way to mitigate this while the project identifies an alternative with a compatible license. We liked the post so much, we asked Chris to share it with our audience. Thanks Chris! -Christophe
So at Digg,
On June 10th, more than 750 people from around the world descended on the Santa Clara Marriott to share their love for a little stuffed elephant named Hadoop. It was a good week to be part of this exploding community, and I want to extend Cloudera’s heartfelt thanks to everyone who made it possible, especially our friends at Yahoo! who organized this Summit. Most importantly, I want to thank all of you who were able to participate.