January 2012 Bay Area HBase User Group meetup summary + HBaseCon announcement

Categories: Community General HBase

More than 150 people attended the San Francisco Bay Area HBase User Group meetup last Thursday, January 19th, at eBay headquarters in San Jose, California.  Presenters from StumbleUpon, Facebook, eBay and MapR shared a wealth of information about Apache HBase operations and optimizations, gleaned from their experience running HBase in production environments.

One special item of note: Michael Stack announced HBaseCon 2012, taking place this spring in the Bay Area.  This inaugural conference will focus on the growth and education of the HBase community.  While details of the event are not yet published, the call for speakers is currently open.  Submit your abstract here.

Many of the talks focused on HBase operations.  Here’s a summary of those presentations:

Aravind Gottipati discussed the HBase deployments at StumbleUpon, reflecting on hardware, requirements, configuration, and monitoring tools. Aravind also pointed out some operational challenges StumbleUpon has faced, and suggested some improvements for future HBase versions.  [slides]

Next, Paul Tuckfield presented on HBase operations at Facebook. He shared interesting facts about their deployment, such as how their clusters span multiple racks to avoid network uplinks as a single point of failure, and how their clusters are as slow as their slowest region server.  [slides]

eBay’s Swati Agarwal and Thomas Pan gave a talk on eBay’s HBase deployments, sharing many statistics about their pre-production deployment, and discussed their need for well-distributed keys and the impact on their rowkey schema. They also talked about their HBase-related challenges, including a need for more stability and how upgrades incur significant downtime.  [slides]

By now, the meeting was running a bit behind schedule, so J.D. Cryans gave a quick presentation about some experiments he did at StumbleUpon involving different caching configurations and datasets. He showed his numbers in a couple of different runs based on a snapshot of the upcoming CDH3u3 release from Cloudera, which is currently in production at StumbleUpon.  The runs were with with no block cache, short-circuited reads, and 100% block cache. The main takeaway was that it is very important to have a good understanding of how much data that needs to be read for your specific use case, and how this data fits into HBase.  [slides]

In addition to the above talks, Tomer Shiran from MapR gave an overview of MapR’s product, and Mikhail Bautin from Facebook concluded the meetup with some slides about the various optimizations that Facebook has contributed back to the HBase community in the area of scanner performance.

Slides for all presentations are available here, and the link to the meetup web page is here.

Thanks to eBay for inviting the HBase User Group to their building, and providing the free pizza and beer.  See you at HBaseCon 2012!