Configuring Eclipse for Apache Hadoop Development (a screencast)

Update (added 5/15/2013): The information below is dated; see this post for current instructions about configuring Eclipse for Hadoop contributions.

One of the perks of using Java is the availability of functional, cross-platform IDEs.  I use vim for my daily editing needs, but when it comes to navigating, debugging, and coding large Java projects, I fire up Eclipse.

Typically, when you’re developing Map-Reduce applications, you simply point Eclipse at the Apache Hadoop jar file, and you’re good to go.  (Cloudera’s Hadoop training VM has a fully-configured example.) However, when you want to dig deeper to explore—and modify—Hadoop’s internals themselves, you’ll want to configure Eclipse to build Hadoop.  Because there’s generated code and a complicated ant build.xml file, this takes some tinkering.  Now that I have the full Hadoop Eclipse experience going (it took me a few tries), I’ve prepared a screencast that will help guide you through it, from downloading Eclipse to debugging one of its unit tests.  You’ll also want to reference the EclipseEnvironment Hadoop wiki page, which has more details.


Eclipse for Hadoop Development from Cloudera.

We’re interested in your feedback!  Use the comments below to share your Eclipse experiences.

P.S.: I “filmed” the screencast on Linux, but the same steps work on Mac OS X. Since Eclipse will be running under Java 1.5 in OS X, be sure to check that “Preferences…Installed JREs” references JVM 1.6. Sometimes Eclipse is finicky; you might have to do “Refresh” and “Build Project” several times. The key bindings are a bit different too: substitute Command (?) for Control.

36 Responses
  • Abhishek Verma / April 20, 2009 / 2:16 PM

    Thanks a lot for the screencast. I would love to see a screencast which explains how to run hadoop from within eclipse with some profiling on (hprof or yourkit).

  • George Pang / May 06, 2009 / 6:40 PM

    Dear Philip,

    How do I get this done if my hadoop is on a VM ware? What more configuration would I need?

    Thank you!

    George

  • George Pang / May 06, 2009 / 6:53 PM

    Is it a good idea to configure the Europa, as in the Yahoo tutorial on http://public.yahoo.com/gogate/hadoop-tutorial/html/module3.html? But
    it seems to have some problem.

    George

  • Philip Zeyliger / May 06, 2009 / 7:20 PM

    George,

    What problem are you experiencing?

    The Eclipse plug-in in Hadoop’s contrib directory does not work with Eclipse 3.4 (see http://issues.apache.org/jira/browse/HADOOP-3744), but that’s not really the subject of this screencast–here we’re setting up an environment to work on Hadoop itself.

    Nothing should be special about working in a VM. In fact, I recorded most of the screencast in a VM running Ubuntu within my OS X environment.

    – Philip

  • George Pang / May 06, 2009 / 11:05 PM

    in the Yahoo tutorial it goes about creating new DFS Location:

    “…..Next, click on the “Advanced†tab. There are two settings here which must be changed.

    Scroll down to hadoop.job.ugi. It contains your current Windows login credentials. Highlight the first comma-separated value in this list (your username) and replace it with hadoop-user….â€

    I can’t find this attribute(hadoop.job.ugi) in the advance list from “Define Hadoop location†on Eclipse(Europa), and I don’t think there is one.

    I know some other people came across the same problem.

    But now I am going to try your configuration with my Eclipse Ganymede. But is there gonna be a map-reduce perspective or view after the setting up?

    Also, where can I have more resource or tutorial on hadoop and web programming with hadoop? For example, how do I integrate my servlets/JSPs with hadoop programs?

    Thank you,

    George

  • George Pang / May 06, 2009 / 11:09 PM

    Philip,

    What do you mean you set up eclipse work on Hadoop itself? Is it much different from using the contrib Eclipse plug-in?

    Thank you,
    George

  • George Pang / May 07, 2009 / 2:23 PM

    Hi Philip,

    I found only SVNKit 1.2.2 and 1.3.0 implementation for Subversive SVN
    Connector. Which one should I choose? Or, which svn should I use?
    Thank you.

    George

    • Philip Zeyliger / May 12, 2009 / 10:54 AM

      You can use either SVN version. –Philip

  • Philip Zeyliger / May 07, 2009 / 2:32 PM

    George,

    This screencast will make sense for folks who want to work on the core Hadoop library itself. It sounds like you’re just using Hadoop, not modifying it, so this may not be the right thing for you. Specifically, this screencast does not address the Hadoop plug-in that you’re trying to get working.

    – Philip

  • George Pang / May 07, 2009 / 4:19 PM

    Got it, thank you!

    George

  • Van / June 04, 2009 / 7:42 AM

    Philip,
    I followed your steps but still got errors. It is built successfully but there are a lot of errors in Problems Tab. It seems that eclipse can’t recognize packages and classes. What should I do?
    Thank you.
    Van

  • Iman / June 04, 2009 / 6:13 PM

    Thank you so much, Philip.
    I followed your instructions but I keep getting this error when I do the project build.
    BUILD FAILED
    /home/iman/workspace/HadoopProject20/build.xml:497: The following error occurred while executing this line:
    /home/iman/workspace/HadoopProject20/src/contrib/build.xml:30: The following error occurred while executing this line:
    /home/iman/workspace/HadoopProject20/src/contrib/eclipse-plugin/build.xml:61: Compile failed; see the compiler error output for details.

    The lines referenced in the error are as follows:

    1 in

    2 in

    3 deprecation=”${javac.deprecation}”> in

    I am checking one of the releases not the trunk. I am also using eclipse 3.4.2 which should be OK as far as I understand.

    Do you have any suggestions how to fix this.
    Thank you very much.

  • Philip Zeyliger / June 17, 2009 / 11:47 AM

    Iman,

    The first thing I would try is to build on the commandline with “ant”, and see if that works. You should also try to see if the build works with trunk. There have been some patches (including HADOOP-5658) that fixed up some Eclipse settings that are only in trunk.

    – Philip

  • Dennis Kubes / June 25, 2009 / 6:49 AM

    Great post. I recently tested this with the newest eclipse version 3.5 and the recently split hadoop projects for core, hdfs, and mapreduce and it works great.

  • Bo Liu / July 16, 2009 / 1:55 PM

    Thanks Philip. This screencast is very helpful.

    One problem that I ran into is the eclipse keeps crushing when I ran the WordCount application from Maprduce/Examples. Here is the error message:

    # An unexpected error has been detected by Java Runtime Environment:
    #
    # SIGSEGV (0xb) at pc=0x0625665c, pid=22830, tid=2219604880
    #
    # Java VM: Java HotSpot(TM) Server VM (10.0-b22 mixed mode linux-x86)
    # Problematic frame:
    # V [libjvm.so+0x25665c]

    ….

    ————— S Y S T E M —————

    OS:Red Hat Enterprise Linux Client release 5 (Tikanga)

    uname:Linux 2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:21 EST 2007 i686
    libc:glibc 2.5 NPTL 2.5
    rlimit: STACK 10240k, CORE infinity, NPROC 32635, NOFILE 1024, AS infinity
    load average:0.16 0.41 0.50

    CPU:total 2 (2 cores per cpu, 1 threads per core) family 6 model 15 stepping 6, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3

    Memory: 4k page, physical 2067332k(131608k free), swap 2031608k(826240k free)

    vm_info: Java HotSpot(TM) Server VM (10.0-b22) for linux-x86 JRE (1.6.0_06-b02), built on Mar 25 2008 00:26:44 by “java_re” w
    ith gcc 3.2.1-7a (J2SE release)

    time: Thu Jul 16 15:13:53 2009
    elapsed time: 565 seconds

  • Philip Zeyliger / July 16, 2009 / 3:55 PM

    Bo,

    That looks like your JVM is crashing. “1.6.0_06-b02″ is not the latest version; perhaps try upgrading?

    – Philip

  • Bo Liu / July 20, 2009 / 1:57 PM

    Thanks Philip! Updating the JVM fixed the problem.

    Another problem: Somehow the eclipse can’t find the main class if I create my own WordCount java project and ran it as a java application. It works fine if I ran this application as Ant build. I added the WordCount in the classpath and set it as default classpath.

    Bo

  • Bo Liu / July 20, 2009 / 3:21 PM

    It worked after adding all the jars in hadoop lib as external jars to the build path.

    Thanks anyway!
    Bo

  • jane liu / November 26, 2009 / 6:51 AM

    why do not i find the screencast?

  • Philip Zeyliger / December 15, 2009 / 2:05 PM

    Jane,

    Can you to to http://vimeo.com/4193623 directly?

  • Linh Ly / January 21, 2010 / 11:39 AM

    The hadoop svn repository used in this screencast is no longer available. Which repository should be used in its place?

  • Bhaskar B / April 07, 2010 / 10:33 PM

    Thanks Philip, this is good stuff. However, I am running into host of issues while trying to build the latest hadoop-common. I’m using Eclipse Galileo (Build id: 20090621-0832). The first issue was that the JAR ivy-2.1.0-rc1.jar that gets downloaded through the task ivy-download would always end up being corrupt. So I manually downloaded it from http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0-rc1/ivy-2.1.0-rc1.jar and placed it within the hadoop-common-trunk/ivy folder which resolved it. Unfortunately now the problem remains at the target “ivy-resolve-common” where I’m getting checksum mismatches (this happens each time on every machine I’ve tried – 3 so far – RHEL and Ubuntu boxes). This also occurs when I run the build script from the shell. I’ve cut and paste the error log in Eclipse below.

    [ivy:resolve] :: problems summary ::
    [ivy:resolve] :::: WARNINGS
    [ivy:resolve] problem while downloading module descriptor: http://repo1.maven.org/maven2/commons-logging/commons-logging/1.1.1/commons-logging-1.1.1.pom: invalid sha1: expected=

    Ideally I want to make this work without modifying the hadoop build scripts if possible.

    Any ideas?

  • Aysan / December 13, 2010 / 12:33 PM

    Hi,

    Thank you very much Phillip. Your screencast helped me a lot.

    I want to implement a new scheduling algorithm for Hadoop. So far I could install and work with SVN hadoop repository in Eclipse. As I know, in the new version of hadoop (0.21.0), it is possible to checkout one of MApReduce, HDFS, or Common projects. I checked out MapReduce one, and I could build it and every thing was fine. However, when I want to run my hadoop project in a cluster mode, I have problem. Because my project does not include the bin/hadoop file. Could you please help me through this?

    Best Regards,
    Aysan

  • cwl / January 21, 2011 / 2:32 PM

    Thanks Phil!

    This cast was extremely helpful.

    What song is playing in the background?

  • jiucai / June 22, 2011 / 2:20 AM

    using maven plugin for eclipse to develop hadoop may be a good way, just config pom.xml

    org.apache.hadoop
    hadoop-core
    0.20.203.0

    then, you will auto have the required jar files.

  • Adam Yee / August 26, 2011 / 5:23 PM

    Can this work with hadoop-0.20.203.0 instead of trunk? For instance, obtaining hadoop-0.20.203.0 source, then creating an Eclipse project from that source.

    Thanks,
    Adam

  • Arun Kumar / September 05, 2011 / 3:37 AM

    The hadoop svn repository used in this screencast is no longer available.
    hadoop versions are checked out from
    svn checkout http://svn.apache.org/repos/asf/hadoop/common/tags/release-X.Y.Z/ hadoop-common-X.Y.Z

    1)We tried with hadoop0.20.2 :
    -> Build was successful
    -> Couldn’t find MetricsServlet.java & TestMericsServlet.java

    2)We tried with hadoop0.20.203
    -> Build was successful
    -> Couldn’t find TestMericsServlet.java

    3)We tried with hadoop0.21
    ->built 3 components successfully
    ->Couldn’t find MetricsServlet.java & DfsServlet.java

    * So couldn’t run and debug for these three cases.
    Any other ways/things to run or debug in these cases.

    Regards,
    Arun Kumar

  • Arun Kumar / September 05, 2011 / 9:00 AM

    Hi !

    I have followed this .
    The hadoop svn repository used in this screencast is no longer available.
    I have checked out hadoop versions from
    http://svn.apache.org/repos/asf/hadoop/common/tags/release-X.Y.Z/ hadoop-common-X.Y.Z

    1)I tried with hadoop0.20.2 :
    -> Build was successful
    -> Couldn’t find MetricsServlet.java & TestMericsServlet.java

    2)I tried with hadoop0.20.203
    -> Build was successful
    -> Couldn’t find TestMericsServlet.java

    3)I tried with hadoop0.21
    ->built 3 components successfully
    ->Couldn’t find any Jars in Java Build path
    ->Couldn’t find MetricsServlet.java & DfsServlet.java

    * So couldn’t run and debug for these three cases.

    Which version of hadoop can be used ?
    Any other ways/files to run or debug any of these hadoop versions ?

    Regards,
    Arun

  • Sujit / October 08, 2011 / 11:52 AM

    The svn link

    http://svn.apache.org/repos/asf/hadoop/core/trunk

    is no longer working. Any suggestions?

  • Anyonymous / October 30, 2011 / 11:49 AM

    http://wiki.apache.org/hadoop/HowToContribute

    Try this for the SVN link:

    http://svn.apache.org/repos/asf/hadoop/common/trunk/ hadoop-trunk

  • Newb / May 03, 2012 / 10:20 AM

    Good day

    I can use: http://svn.apache.org/repos/asf/hadoop/common/trunk/

    but not with the hadoop-trunk at the end.

    I have an issue where I don’t get the build.xml…why isn’t this in the hadoop-trunk?

    How can I move forward?

    thank you

    • Jon Zuanich / May 07, 2012 / 10:18 AM

      Recent versions of Hadoop have moved to Maven. “mvn eclipse:eclipse” could be a good starting point.

  • Liga / May 10, 2012 / 10:00 AM

    It didn’t work for me for hadoop-0.20.2

  • Seth Meyerson / September 02, 2012 / 5:37 PM

    Will this guide work with Juno or will configuration be diferent?

  • Pavi / September 06, 2012 / 5:07 AM

    Hi , I am new to Hadoop and want to configure code in eclipse. You have used Linux.. so will it be configurable in windown machine. I have hadoop cluster installed on current working domain.

    Thanks.

Leave a comment


seven × 6 =