New Additions to the Apache HBase Team

Categories: Community HBase

StumbleUpon (SU) and Cloudera have signed a technology collaboration agreement. Cloudera will support the SU clusters, and in exchange, Cloudera will have access to a variety of production deploys on which to study and try out beta software.

As part of the agreement, the StumbleUpon Apache HBase+Apache Hadoop team — Jean-Daniel Cryans, Elliott Clark and I — have joined Cloudera. From our new perch up in the Cloudera San Francisco office — 10 blocks north and 11 floors up — we will continue as first-level support for SU clusters, tending and optimizing them as we have always done. The rest of our time will be spent helping develop Apache HBase as the newest additions to Cloudera’s HBase team.

We do not foresee this transition disrupting our roles as contributors to HBase. If anything, we look forward to contributing even more than in the past.

As we see it, our job at SU was effectively done. We had put in place a stable, scalable data store used both for low latency serving of the SU frontend, and by bigger, backend batch clusters used by scientists and analysts running all kinds of processing and reporting MapReduce jobs. The front-end clusters are set up so they replicate to the batch and backup clusters across datacenters. All clusters are multi-tenant serving a variety of schemas, features and a wide breadth of access patterns. As the SU dataset  and processing demands continue to grow, all they need do to scale their data store is add boxes.

While we once furiously made HBase customizations to facilitate new SU features, there is less of that of late and most of our SU-specific work has long since been pushed upstream. We reached a state whereby software updates to the SU HBase+Hadoop stack, apart from the odd bug fix, came whenever the HBase project put up a new release candidate for the community to try. At SU we tested the release candidate by rolling it up through the SU clusters from dev, through batch, and eventually out to the front-end-serving cluster if all proved stable.

We therefore were spending most of our time working in open source on the HBase project helping releases along, with the remainder spent operating our HBase+Hadoop clusters and educating folks a bit about how best to use the datastore (and Hadoop). It became apparent after a while that if we could hand off operations, there was little to tie us to SU in particular.

Cloudera has a wide variety of customer HBase installs. It also already has a strong HBase team who, to tell you the truth, were barely keeping up with growth in the customer base. We wanted to give them a helping hand.  We also wanted to be able to take on some larger features and fix-ups — snapshots and backups, to mention but a few — projects that take more than a single developer and a weekend to finish. Being part of a larger team, we will be able to do this. At Cloudera, we would also be better positioned to improve HBase integration with the rest of the Hadoop stack. Finally, Cloudera is in the support business. We could set up an arrangement whereby we could continue to look after SU.

This move strikes us as a natural progression – “coming home,” as Mike Olson calls it. We are super-psyched to be joining Cloudera even if we are going to miss our old haunt, the super supportive SU, which generously sponsored and pioneered HBase these last three years.

— Michael Stack