Introducing Hadoop Development Status

Categories: Community

We’re happy to announce a new tool we have been developing here at Cloudera: Hadoop Development Status. Hadoop Development Status aims to help the Hadoop community understand its direction, health, and participants. The project currently monitors the most active contributors according to mailing list traffic, the most watched JIRA tickets, and aggregate traffic volumes on the Hadoop mailing lists.

The graph of messages per month on the Hadoop Core lists shows a sustained growth in traffic. During this time, new sub-projects have been added to the Hadoop Top Level Project (HBase, ZooKeeper, Pig, Hive), but we haven’t created graphs for them yet. In fact, HBase was in core as a contrib module until February this year, when it became a sub-project. The growth in traffic on the core lists makes them difficult to follow, and this is one of the reasons for the planned partitioning of Core into Core, HDFS and MapReduce sub-projects, and the promotion of Hive into a sub-project.

Contributions to Hadoop take many forms, including writing code, answering the questions of other users, and creating documentation. The number of messages sent to the mailing list is just one measure of how active a contributor is: you can see such a graph here.

A perennial problem in open source projects is planning what’s in the next release. When folks are scratching their own itches, it’s difficult to predict what will be implemented, and when. JIRA has a few features that allow users to indicate their preferences, and these features can help planners prioritize features: folks can vote on an issue to say that they want this feature, or they can watch an issue to be informed of changes as development on it progresses. Voting is not widely used on the Hadoop JIRA, but watching is, and the number of watchers gives some level of interest in a new feature (at least among developers). On this basis, the ranked list of watched issues says that TFile, the Capacity Scheduler, X-Trace, MapReduce context objects, and processing multiple splits per mapper are the issues to watch.

We have plenty of other ideas that will help us all understand the Hadoop community better, and we plan to be very active in developing this tool. However, we definitely haven’t thought of every way to improve it, so please email us with suggestions and improvements.

–Alex, Jeff, and Tom