The Second Apache Hadoop HDFS and MapReduce Contributors Meeting

Categories: General

The second Apache Hadoop HDFS and MapReduce contributors meeting was held last Friday, May 28 at Cloudera’s offices in Palo Alto. Apache projects attract contributors from across the globe, and Hadoop is no exception, so the idea of holding face-to-face meetings may seem to run counter to the existence such a highly decentralized organization. However, the point of in-person meetings is not to make project decisions, but rather to start discussions that spur more in-depth, on-list decision making. Chris Douglas took excellent, detailed minutes of the meeting.

The general theme of the meetings has been to discuss project process; in particular how does the Hadoop development community continue to move the platform forward while supporting the large user base that Hadoop has attracted? In this vein, Eli Collins presented his proposal for Hadoop to adopt a new mechanism for adding significant new features, analogous to Python’s PEP (Python Enhancement Proposal). A HEP (Hadoop Enhancement Proposal) would be used whenever a large new feature is being planned for Hadoop. By way of example, something the size of the backup namenode would need a HEP, but something like the pure Java CRC enhancement would probably not.

At heart, a HEP is a consensus-building process for Hadoop changes. Some of the improvements that HEPs would bring include: discoverability (today, it’s too easy to miss umbrella JIRAs), achieving buy-in on use cases (focusing first on the problem that the HEP would solve, rather than diving straight into code, which can cause problems), and ensuring completeness (there would be a list of tasks that would need to be addressed by the HEP, such as backwards compatibility). HEPs would be reviewed by the PMC, and approval would mean that the authors could proceed with the implementation, although changes would still need the usual review and committer approval before being committed to the main line of development. Eli’s slides (along with Chris’s minutes) cover more of the details, and there’s some initial discussion on

We also talked about feature branches as a tool for large changes, and how the choice of using a feature branch or a patch might be made on a HEP-by-HEP basis.

Finally, we discussed contrib modules in Hadoop, and how it would be worth looking at whether some might be spun off to be hosted elsewhere, in the way HBase recently did. This topic will be taken up on the lists at a future date.

The contributors meetings are open to anyone who contributes to the HDFS or MapReduce projects. So if you’re in the area, consider signing up for future meetings at