Cloudera Developer Blog · HBase Posts
The post below was originally published at blogs.apache.org/hbase. We re-publish it here for your convenience.
Apache HBase is a distributed big data store modeled after Google’s Bigtable paper. As with all distributed systems, knowing what’s happening at a given time can help spot problems before they arise, debug on-going issues, evaluate new usage patterns, and provide insight into capacity planning.
Since October 2008, version 0.19.0 (HBASE-625), HBase has been using Apache Hadoop’s metrics system to export metrics to JMX, Ganglia, and other metrics sinks. As the code base grew, more and more metrics were added by different developers. New features got metrics. When users needed more data on issues, they added more metrics. These new metrics were not always consistently named, and some were not well documented.
This post was originally published via blogs.apache.org, we republish it here in a slightly modified form for your convenience:
At first glance, the Apache HBase architecture appears to follow a master/slave model where the master receives all the requests but the real work is done by the slaves. This is not actually the case, and in this article I will describe what tasks are in fact handled by the master and the slaves.
Regions and Region Servers
HBase is the Hadoop storage manager that provides low-latency random reads and writes on top of HDFS, and it can handle petabytes of data. One of the interesting capabilities in HBase is auto-sharding, which simply means that tables are dynamically distributed by the system when they become too large.
HBaseCon (hosted by Cloudera), now in its second year, is THE community event for Apache HBase contributors, developers, admins, and users. There is no better place to dive head-first into use cases, best practices, internals, and futures as well as to meet the rest of the community.
This how-to is the second in a series that explores the use of the Apache HBase REST interface. Part 1 covered HBase REST fundamentals, some Python caveats, and table administration. Part 2 below will show you how to insert multiple rows at once using XML and JSON. The full code samples can be found on GitHub.
Adding Rows With XML
The REST interface would be useless without the ability to add and update row values. The interface gives us this ability with the
POST verb. By posting new rows, we can add new rows or update existing rows using the same row key.
First, let’s step through how to do this using the XML and JSON data formats. Let’s start with XML.
It’s time for me to give you a quarterly update (here’s the one for Q1) about where to find tech talks by Cloudera employees in 2013. Committers, contributors, and other engineers will travel to meetups and conferences near and far to do their part in the community to make Apache Hadoop a household word!
(Remember, we’re always ready to assist your meetup by providing speakers, sponsorships, and schwag.)
A couple highlights:
With HBaseCon 2013 (Early Bird registration now open!) preparations in full swing, you may be interested in learning a bit about the personalities behind the Program Committee, who are tasked with formulating a compelling, community-focused agenda.
Recently I had a chance to ask committee members Gary Helmling (Twitter), Lars Hofhansl (Salesforce.com), Jon Hsieh (Cloudera), Doug Meil (Explorys), Andrew Purtell (Intel), Enis Söztutar (Hortonworks), Michael Stack (Cloudera), and Liyin Tang (Facebook) a few questions:
How did you get involved in the HBase community?
The following FAQ is provided by James Taylor of Salesforce, which recently open-sourced its Phoenix client-embedded JDBC driver for low-latency queries over HBase. Thanks, James!
What is this new Phoenix thing I’ve been hearing about?
Phoenix is an open source SQL skin for HBase. You use the standard JDBC APIs instead of the regular HBase client APIs to create tables, insert data, and query your HBase data.
Doesn’t putting an extra layer between my application and HBase just slow things down?
Actually, no. Phoenix achieves as good or likely better performance than if you hand-coded it yourself (not to mention with a heck of a lot less code) by:
Hadoop Summit Europe is coming up in Amsterdam next week, so this is an appropriate time to make you aware of the Cloudera speaker program there (all three talks on Thursday, March 21):
The following guest post is provided by Aaron Kimball, CTO of WibiData.
The Kiji ecosystem has grown with the addition of a new module, KijiMR. The Kiji framework is a collection of components that offer developers a handle on building Big Data Applications. In addition to the first release, KijiSchema, we are now proud to announce the availability of a second component: KijiMR. KijiMR allows KijiSchema users to use MapReduce techniques including machine-learning algorithms and complex analytics to develop many kinds of applications using data in KijiSchema. Read on to learn more about the major features included in KijiMR and how you can use them.
KijiMR offers developers a set of new processing primitives explicitly designed for interacting with complex table-oriented data. The low-level batch interfaces available in MapReduce include basic InputFormat and OutputFormat implementations. The raw APIs are designed for processing key-value pairs stored in flat files in HDFS. Integrating MapReduce with HBase via InputFormat and OutputFormat APIs is hard to do from scratch in every algorithm. In KijiMR, we have extended the available MapReduce APIs to include:
There are various ways to access and interact with Apache HBase. The Java API provides the most functionality, but many people want to use HBase without Java.
There are two main approaches for doing that: One is the Thrift interface, which is the faster and more lightweight of the two options. The other way to access HBase is using the REST interface, which uses HTTP verbs to perform an action, giving developers a wide choice of languages and programs to use.
This series of how-to’s will discuss the REST interface and provide Python code samples for accessing it. The first post will cover HBase REST, some Python caveats, and table administration. The second post will explain how to insert multiple rows at a time using XML and JSON. The third post will show how to get multiples rows using XML and JSON. The full code samples can be found on my GitHub account.