Cloudera Blog · HBase Posts

HBaseCon 2013: "Operations" Track Preview

As you have probably learned by now, HBaseCon 2013 sessions are organized into four tracks: Operations, Internals, Ecosystem, and Case Studies. In combination, they offer a 360-degree view of Apache HBase that is invaluable for experts and aspiring experts alike. In the next few posts leading up to the conference (June 13 in San Francisco – register now while there’s still room), we’ll offer sneak previews of what each track has to offer.

First up is the Operations track, which will be hosted by Facebook’s Liyin Tang (HBase PMC Member and HBaseCon keynote speaker):

Customer Spotlight: Gravity Creates Personalized Web Experience, 300-400% Higher Click-through

According to Jim Benedetto, Gravity’s co-founder and CTO, there have been two paradigm shifts that have transformed consumers’ web experience to date:

Fresh and Hot: HBaseCon 2013 Schedule Finalized!

The schedule/agenda grid for HBaseCon 2013 (rapidly approaching: June 13 in San Francisco) is a thing of beauty.

If you lacked motivation to register up until this point, we think that this session line-up will convince you otherwise. We repeat: whether you’re an HBase committer or just getting started (or at any level in between), HBaseCon is simply an event that you can’t afford to miss – and with an entry fee of just $350, it’s also one you can easily afford.

Top 5 Reasons to Attend HBaseCon 2013

HBaseCon 2013 is approaching fast – June 13 in San Francisco. If you’re on the fence about attending – or perhaps your manager is on the fence about approving your participation – here are a few things that you/they need to know (in no particular order):

  1. HBaseCon is the annual rallying point for the HBase community. If you’ve ever had a desire to learn how to get involved in the community as a contributor, or just want to ask a committer or PMC member why things are done (or not done) a certain way, this is your opportunity – because this is where those people are. Participating in a mailing list thread is never quite the same once you’ve met the people behind it. 
     
  2. HBaseCon is a one-stop shop for learning about the HBase roadmap, as well as other projects across the ecosystem. Current HBase users should be particularly interested in learning about which JIRAs will have the most impact on the user experience – and once again, most of the committers working on those JIRAs will either be leading sessions or otherwise present. Plus, you can learn about how new complementary projects like Impala, Kiji, Phoenix, and Honeycomb are transforming the use cases for HBase and helping to expand its footprint across the enterprise.
     
  3. HBaseCon is a feast of real-world experiences and use cases. Sure, maybe you’ve read about the HBase-backed applications used by companies like Facebook, Salesforce.com, eBay, Pinterest, and Yahoo!. But wouldn’t it be helpful to hear technical details and best practices directly from the people who built and run them? I’ll bet it would. And you really can’t do that anywhere else — in the whole world. (Plus, you can take advantage of formal training right before the conference, at a discount.)
     
  4. HBaseCon is a pageant of engineer rock-stars. If your company is an HBase user and hungry for talent, there’s no better place to find it: HBaseCon is literally the world’s biggest gathering of HBase experts under one roof.
     
  5. HBaseCon is a heck of a blast. Come for the deep-dives and advice, stay for the after-event party. The libations will be extensive!

If you have any interest in HBase whatsoever, whether as a user or prospective user, missing HBaseCon is almost unthinkable

Metrics2: The New Hotness for Apache HBase Metrics

The post below was originally published at blogs.apache.org/hbase. We re-publish it here for your convenience.

Apache HBase is a distributed big data store modeled after Google’s Bigtable paper. As with all distributed systems, knowing what’s happening at a given time can help  spot problems before they arise, debug on-going issues, evaluate new usage patterns, and provide insight into capacity planning.

Since October 2008, version 0.19.0 (HBASE-625), HBase has been using Apache Hadoop’s metrics system to export metrics to JMX, Ganglia, and other metrics sinks. As the code base grew, more and more metrics were added by different developers. New features got metrics. When users needed more data on issues, they added more metrics. These new metrics were not always consistently named, and some were not well documented.

How Scaling Really Works in Apache HBase

This post was originally published via blogs.apache.org, we republish it here in a slightly modified form for your convenience:

At first glance, the Apache HBase architecture appears to follow a master/slave model where the master receives all the requests but the real work is done by the slaves. This is not actually the case, and in this article I will describe what tasks are in fact handled by the master and the slaves.

Regions and Region Servers

HBase is the Hadoop storage manager that provides low-latency random reads and writes on top of HDFS, and it can handle petabytes of data. One of the interesting capabilities in HBase is auto-sharding, which simply means that tables are dynamically distributed by the system when they become too large.

HBaseCon 2013 Speakers, Tracks, and Sessions Announced

Thanks to a dazzling array of excellent proposals from across the Apache HBase community, the HBaseCon 2013 Program Committee has cooked up a great list of sessions

HBaseCon (hosted by Cloudera), now in its second year, is THE community event for Apache HBase contributors, developers, admins, and users. There is no better place to dive head-first into use cases, best practices, internals, and futures as well as to meet the rest of the community. 

How-to: Use the Apache HBase REST Interface, Part 2

This how-to is the second in a series that explores the use of the Apache HBase REST interface. Part 1 covered HBase REST fundamentals, some Python caveats, and table administration. Part 2 below will show you how to insert multiple rows at once using XML and JSON. The full code samples can be found on GitHub.

Adding Rows With XML

The REST interface would be useless without the ability to add and update row values. The interface gives us this ability with the POST verb. By posting new rows, we can add new rows or update existing rows using the same row key.

First, let’s step through how to do this using the XML and JSON data formats. Let’s start with XML.

Where to Find Cloudera Tech Talks Through June 2013

It’s time for me to give you a quarterly update (here’s the one for Q1) about where to find tech talks by Cloudera employees in 2013. Committers, contributors, and other engineers will travel to meetups and conferences near and far to do their part in the community to make Apache Hadoop a household word!

(Remember, we’re always ready to assist your meetup by providing speakers, sponsorships, and schwag.)

A couple highlights:

Meet the HBaseCon 2013 Program Committee

With HBaseCon 2013 (Early Bird registration now open!) preparations in full swing, you may be interested in learning a bit about the personalities behind the Program Committee, who are tasked with formulating a compelling, community-focused agenda. 

Recently I had a chance to ask committee members Gary Helmling (Twitter), Lars Hofhansl (Salesforce.com), Jon Hsieh (Cloudera), Doug Meil (Explorys), Andrew Purtell (Intel), Enis Söztutar (Hortonworks), Michael Stack (Cloudera), and Liyin Tang (Facebook) a few questions:

How did you get involved in the HBase community?

Older Posts