Configuring Eclipse for Apache Hadoop Development (a screencast)

Categories: Data Ingestion General HDFS Training

Update (added 5/15/2013): The information below is dated; see this post for current instructions about configuring Eclipse for Hadoop contributions.

One of the perks of using Java is the availability of functional, cross-platform IDEs.  I use vim for my daily editing needs, but when it comes to navigating, debugging, and coding large Java projects, I fire up Eclipse.

Typically, when you’re developing Map-Reduce applications, you simply point Eclipse at the Apache Hadoop jar file, and you’re good to go.  (Cloudera’s Hadoop training VM has a fully-configured example.) However, when you want to dig deeper to explore—and modify—Hadoop’s internals themselves, you’ll want to configure Eclipse to build Hadoop.  Because there’s generated code and a complicated ant build.xml file, this takes some tinkering.  Now that I have the full Hadoop Eclipse experience going (it took me a few tries), I’ve prepared a screencast that will help guide you through it, from downloading Eclipse to debugging one of its unit tests.  You’ll also want to reference the EclipseEnvironment Hadoop wiki page, which has more details.

Eclipse for Hadoop Development from Cloudera.

We’re interested in your feedback!  Use the comments below to share your Eclipse experiences.

P.S.: I “filmed” the screencast on Linux, but the same steps work on Mac OS X. Since Eclipse will be running under Java 1.5 in OS X, be sure to check that “Preferences…Installed JREs” references JVM 1.6. Sometimes Eclipse is finicky; you might have to do “Refresh” and “Build Project” several times. The key bindings are a bit different too: substitute Command (?) for Control.


36 responses on “Configuring Eclipse for Apache Hadoop Development (a screencast)

  1. Abhishek Verma

    Thanks a lot for the screencast. I would love to see a screencast which explains how to run hadoop from within eclipse with some profiling on (hprof or yourkit).

  2. George Pang

    Dear Philip,

    How do I get this done if my hadoop is on a VM ware? What more configuration would I need?

    Thank you!


  3. Philip Zeyliger Post author


    What problem are you experiencing?

    The Eclipse plug-in in Hadoop’s contrib directory does not work with Eclipse 3.4 (see, but that’s not really the subject of this screencast–here we’re setting up an environment to work on Hadoop itself.

    Nothing should be special about working in a VM. In fact, I recorded most of the screencast in a VM running Ubuntu within my OS X environment.

    — Philip

  4. George Pang

    in the Yahoo tutorial it goes about creating new DFS Location:

    “…..Next, click on the “Advanced” tab. There are two settings here which must be changed.

    Scroll down to hadoop.job.ugi. It contains your current Windows login credentials. Highlight the first comma-separated value in this list (your username) and replace it with hadoop-user….”

    I can’t find this attribute(hadoop.job.ugi) in the advance list from “Define Hadoop location” on Eclipse(Europa), and I don’t think there is one.

    I know some other people came across the same problem.

    But now I am going to try your configuration with my Eclipse Ganymede. But is there gonna be a map-reduce perspective or view after the setting up?

    Also, where can I have more resource or tutorial on hadoop and web programming with hadoop? For example, how do I integrate my servlets/JSPs with hadoop programs?

    Thank you,


  5. George Pang


    What do you mean you set up eclipse work on Hadoop itself? Is it much different from using the contrib Eclipse plug-in?

    Thank you,

  6. George Pang

    Hi Philip,

    I found only SVNKit 1.2.2 and 1.3.0 implementation for Subversive SVN
    Connector. Which one should I choose? Or, which svn should I use?
    Thank you.


  7. Philip Zeyliger Post author


    This screencast will make sense for folks who want to work on the core Hadoop library itself. It sounds like you’re just using Hadoop, not modifying it, so this may not be the right thing for you. Specifically, this screencast does not address the Hadoop plug-in that you’re trying to get working.

    — Philip

  8. Van

    I followed your steps but still got errors. It is built successfully but there are a lot of errors in Problems Tab. It seems that eclipse can’t recognize packages and classes. What should I do?
    Thank you.

  9. Iman

    Thank you so much, Philip.
    I followed your instructions but I keep getting this error when I do the project build.
    /home/iman/workspace/HadoopProject20/build.xml:497: The following error occurred while executing this line:
    /home/iman/workspace/HadoopProject20/src/contrib/build.xml:30: The following error occurred while executing this line:
    /home/iman/workspace/HadoopProject20/src/contrib/eclipse-plugin/build.xml:61: Compile failed; see the compiler error output for details.

    The lines referenced in the error are as follows:

    1 in

    2 in

    3 deprecation=”${javac.deprecation}”> in

    I am checking one of the releases not the trunk. I am also using eclipse 3.4.2 which should be OK as far as I understand.

    Do you have any suggestions how to fix this.
    Thank you very much.

  10. Philip Zeyliger Post author


    The first thing I would try is to build on the commandline with “ant”, and see if that works. You should also try to see if the build works with trunk. There have been some patches (including HADOOP-5658) that fixed up some Eclipse settings that are only in trunk.

    — Philip

  11. Dennis Kubes

    Great post. I recently tested this with the newest eclipse version 3.5 and the recently split hadoop projects for core, hdfs, and mapreduce and it works great.

  12. Bo Liu

    Thanks Philip. This screencast is very helpful.

    One problem that I ran into is the eclipse keeps crushing when I ran the WordCount application from Maprduce/Examples. Here is the error message:

    # An unexpected error has been detected by Java Runtime Environment:
    # SIGSEGV (0xb) at pc=0x0625665c, pid=22830, tid=2219604880
    # Java VM: Java HotSpot(TM) Server VM (10.0-b22 mixed mode linux-x86)
    # Problematic frame:
    # V []


    ————— S Y S T E M —————

    OS:Red Hat Enterprise Linux Client release 5 (Tikanga)

    uname:Linux 2.6.18-8.el5 #1 SMP Fri Jan 26 14:15:21 EST 2007 i686
    libc:glibc 2.5 NPTL 2.5
    rlimit: STACK 10240k, CORE infinity, NPROC 32635, NOFILE 1024, AS infinity
    load average:0.16 0.41 0.50

    CPU:total 2 (2 cores per cpu, 1 threads per core) family 6 model 15 stepping 6, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3

    Memory: 4k page, physical 2067332k(131608k free), swap 2031608k(826240k free)

    vm_info: Java HotSpot(TM) Server VM (10.0-b22) for linux-x86 JRE (1.6.0_06-b02), built on Mar 25 2008 00:26:44 by “java_re” w
    ith gcc 3.2.1-7a (J2SE release)

    time: Thu Jul 16 15:13:53 2009
    elapsed time: 565 seconds

  13. Philip Zeyliger Post author


    That looks like your JVM is crashing. “1.6.0_06-b02” is not the latest version; perhaps try upgrading?

    — Philip

  14. Bo Liu

    Thanks Philip! Updating the JVM fixed the problem.

    Another problem: Somehow the eclipse can’t find the main class if I create my own WordCount java project and ran it as a java application. It works fine if I ran this application as Ant build. I added the WordCount in the classpath and set it as default classpath.


  15. Bo Liu

    It worked after adding all the jars in hadoop lib as external jars to the build path.

    Thanks anyway!

  16. Linh Ly

    The hadoop svn repository used in this screencast is no longer available. Which repository should be used in its place?

  17. Bhaskar B

    Thanks Philip, this is good stuff. However, I am running into host of issues while trying to build the latest hadoop-common. I’m using Eclipse Galileo (Build id: 20090621-0832). The first issue was that the JAR ivy-2.1.0-rc1.jar that gets downloaded through the task ivy-download would always end up being corrupt. So I manually downloaded it from and placed it within the hadoop-common-trunk/ivy folder which resolved it. Unfortunately now the problem remains at the target “ivy-resolve-common” where I’m getting checksum mismatches (this happens each time on every machine I’ve tried – 3 so far – RHEL and Ubuntu boxes). This also occurs when I run the build script from the shell. I’ve cut and paste the error log in Eclipse below.

    [ivy:resolve] :: problems summary ::
    [ivy:resolve] :::: WARNINGS
    [ivy:resolve] problem while downloading module descriptor: invalid sha1: expected=

    Ideally I want to make this work without modifying the hadoop build scripts if possible.

    Any ideas?

  18. Aysan


    Thank you very much Phillip. Your screencast helped me a lot.

    I want to implement a new scheduling algorithm for Hadoop. So far I could install and work with SVN hadoop repository in Eclipse. As I know, in the new version of hadoop (0.21.0), it is possible to checkout one of MApReduce, HDFS, or Common projects. I checked out MapReduce one, and I could build it and every thing was fine. However, when I want to run my hadoop project in a cluster mode, I have problem. Because my project does not include the bin/hadoop file. Could you please help me through this?

    Best Regards,

  19. jiucai

    using maven plugin for eclipse to develop hadoop may be a good way, just config pom.xml


    then, you will auto have the required jar files.

  20. Adam Yee

    Can this work with hadoop- instead of trunk? For instance, obtaining hadoop- source, then creating an Eclipse project from that source.


  21. Arun Kumar

    The hadoop svn repository used in this screencast is no longer available.
    hadoop versions are checked out from
    svn checkout hadoop-common-X.Y.Z

    1)We tried with hadoop0.20.2 :
    -> Build was successful
    -> Couldn’t find &

    2)We tried with hadoop0.20.203
    -> Build was successful
    -> Couldn’t find

    3)We tried with hadoop0.21
    ->built 3 components successfully
    ->Couldn’t find &

    * So couldn’t run and debug for these three cases.
    Any other ways/things to run or debug in these cases.

    Arun Kumar

  22. Arun Kumar

    Hi !

    I have followed this .
    The hadoop svn repository used in this screencast is no longer available.
    I have checked out hadoop versions from hadoop-common-X.Y.Z

    1)I tried with hadoop0.20.2 :
    -> Build was successful
    -> Couldn’t find &

    2)I tried with hadoop0.20.203
    -> Build was successful
    -> Couldn’t find

    3)I tried with hadoop0.21
    ->built 3 components successfully
    ->Couldn’t find any Jars in Java Build path
    ->Couldn’t find &

    * So couldn’t run and debug for these three cases.

    Which version of hadoop can be used ?
    Any other ways/files to run or debug any of these hadoop versions ?


    1. Jon Zuanich

      Recent versions of Hadoop have moved to Maven. “mvn eclipse:eclipse” could be a good starting point.

  23. Pavi

    Hi , I am new to Hadoop and want to configure code in eclipse. You have used Linux.. so will it be configurable in windown machine. I have hadoop cluster installed on current working domain.