Configuring Eclipse for Apache Hadoop Development (a screencast)
Update (added 5/15/2013): The information below is a bit dated; see this post for current instructions about configuring Eclipse for Hadoop contributions.
One of the perks of using Java is the availability of functional, cross-platform IDEs. I use
vim for my daily editing needs, but when it comes to navigating, debugging, and coding large Java projects, I fire up Eclipse.
Typically, when you’re developing Map-Reduce applications, you simply point Eclipse at the Apache Hadoop
jar file, and you’re good to go. (Cloudera’s Hadoop training VM has a fully-configured example.) However, when you want to dig deeper to explore—and modify—Hadoop’s internals themselves, you’ll want to configure Eclipse to build Hadoop. Because there’s generated code and a complicated
build.xml file, this takes some tinkering. Now that I have the full Hadoop Eclipse experience going (it took me a few tries), I’ve prepared a screencast that will help guide you through it, from downloading Eclipse to debugging one of its unit tests. You’ll also want to reference the EclipseEnvironment Hadoop wiki page, which has more details.
We’re interested in your feedback! Use the comments below to share your Eclipse experiences.
P.S.: I “filmed” the screencast on Linux, but the same steps work on Mac OS X. Since Eclipse will be running under Java 1.5 in OS X, be sure to check that “Preferences…Installed JREs” references JVM 1.6. Sometimes Eclipse is finicky; you might have to do “Refresh” and “Build Project” several times. The key bindings are a bit different too: substitute Command (?) for Control.