Shopzilla’s Apache Hadoop Hackathon: Learning To Contribute

This is a guest repost from Shopzilla’s Tech Blog written by Andrew Look, a Software Engineer at Shopzilla.com. Andrew is responsible for maintaining and constructing SEM systems to manage keyword-based marketing operations, Andrew also has a strong background in highly concurrent web applications and Service Oriented Architectures.

Having gained a strong interest in Hadoop/NoSQL after prototyping a workflow based on MapReduce/Pig, he is now co-organizer of the Los Angeles Hadoop Users’ Group, evangelizing use of the Hadoop project within the Southern California software community.

With the objective of recruiting new contributors to the Hadoop ecosystem, our most recent meetup of the LA-HUG included a 5-hour hackathon in which attendees learned to set up a development environment, checkout/build/test the Hadoop codebase, find issues to work on in the Apache JIRA system, and understand the community review process.

Aaron T. Myers and Eric Sammers of Cloudera expertly led the session of roughly 15 developers, describing how developers can make contributions to Hadoop Common, MapReduce, and HDFS. While the learning curve was steep, a number of developers were able to submit patches during the session; they are currently pending community review for inclusion into the release of Hadoop 0.23.0.

  • HADOOP-7418 support for multiple slashes in the path separator
  • HDFS-1322 DistributedFileSystem.mkdirs(dir, dirPermission) doesn’t set the permissions of created dir to dirPermission
  • HDFS-1314 dfs.block.size accepts only absolute value
  • HDFS-1321 If service port and main port are the same, there is no clear log message explaining the issue.

For those interested in contributing to project in the Hadoop ecosystem, Aaron has prepared a very helpful document, which explains how to pull down the source code, set up a development environment, run the tests, and find JIRA issues to work on.

As a final note, Eric made an exciting announcement that several independent open-source projects related to Hadoop, which Cloudera has been maintaining, have been accepted into the Apache incubator program. They are actively recruiting committers; your help will make a big difference, whether you’re a Java developer, build/release/scripting ninja, documentation-writer, web designer, bug reporter, or even an end-user who pulls down the code and runs it! The list of projects is as follows:

Thank you to everyone who attended! We hope to see more Hadoop enthusiasts give back to this ever-growing community.

LA-Hackathon

Filed under:

1 Response
  • Nag / July 29, 2011 / 1:47 PM

    Hi,
    Very nice post.

    Can you please help in setting up similar workshops in Bayarea ?

    I was watching the Hadoop Common Jira for quite some time, however it is very tough to get started. It would be of great help if you could help on this.

    Thanks to Carl Steinbach for setting up the Hive contributors but wish this targets newbies.

    Thanks, Nag

Leave a comment


+ 6 = eight