Apache HBase Pow-wow Summary 11/29/2011
San Francisco, Salesforce.com HQ - Recently there was an Apache HBase Pow-wow where project contributors gathered to discuss the directions of future releases of HBase in person. This group included a quorum of the core committers from Facebook, StumbleUpon, Salesforce, eBay, and Cloudera as well as many contributors and users from other companies. This was an open discussion, and in compliance with Apache Software Foundation policies, the agenda and detailed minutes were shared with the community at large so that everyone can chime in before any final decisions are made.
We summarize some of the high-level discussion topics:
- What is HBase? (and what isn’t it)
- Releases, branching, and versioning
- Goals for operability
Stability is a main focus for future releases and there was a bit of discussion about test tools and methods to ensure stability. One point of discussion was the availability of Nicholas Keywal’s new testing categorization and testing speed improvements. This was followed by approaches for integration-level testing — Roman Shaposhnik from Cloudera explained Apache Bigtop’s goals (a recent Bigtop presentation), and Mikhail Boutin presented a deck about the levels of testing done with patches internally at Facebook. This later broke out into a “testing mini-summit” between some of the Facebook, Cloudera, and StumbleUpon folks focused on system and integration testing.
The next major discussion was about what HBase is trying to be. After some discussions, the consensus seemed to be simply: “HBase is super-fast reliable big data store”. This in effect means a focus on tightening up the core and encouraging additions (specific coprocessors for example) to remain as separate projects.
With a 0.92 release candidate just out, versions and branches for 0.94 was the next topic of discussion. The first half boiled down to a debate between time-based and feature-based releases. The consensus was to defer branching 0.94 immediately, to be stricter with trunk, to use feature branches for larger contributions, and to have a maximum elapsed time between releases so that there can be multiple release per year. The precondition for branching 0.94 would be either the merging of a potentially destabilizing change or some minimum elapsed timed. A follow-on discussion was about naming a release the 1.0 release. We generally agreed that 1.0 would be reserved for when we achieved BigTable parity and at least a release or so after 0.94.
Operability and training new HBase contributors and operators was the next focus. Today, it seems that most shops have folks in a developer role and a folks in an operational DBA-like role. The goal of this thrust is to improve tools that enable someone with black-box knowledge to repair and operate the HBase system. This means improving tools like hbck and to improve logging to provide more actionable messages.
Next, Nicholas Spiegelberg from Facebook pitched Phabricator and Arc — tools for core developers that simplify the patch review and submission process.
The scheduled discussions wrapped up with some wishful blue sky discussion, and then we broke out into smaller discussions accompanied with pizza and brews.
Thanks for Lars Hofhansl, Chris Trezzo and the Salesforce.com crew for hosting!