Hello, Starbase: A Python Wrapper for the HBase REST API

The following guest post is provided by Artur Barseghyan, a web developer currently employed by Goldmund, Wyldebeast & Wunderliebe in The Netherlands.

Python is my personal (and primary) programming language of choice and also happens to be the primary programming language at my company. So, when starting to work with a new technology, I prefer to use a clean and easy (Pythonic!) API.

After studying tons of articles on the web, reading (and writing) white papers, and doing basic performance tests (sometimes hard if you’re on a tight schedule), my company recently selected Cloudera for our Big Data platform (including using Apache HBase as our data store for Apache Hadoop), with Cloudera Manager serving a role as “one console to rule them all.”

However, I was surprised shortly thereafter to learn about the absence of a working Python wrapper around the REST API for HBase (aka Stargate). I decided to write one in my free time, and the result, ladies and gentlemen, was Starbase (GPL).

In this post, I will provide some code samples and briefly explain what work has been done on Starbase. I assume that reader of this blog post already has some basic understanding of HBase (that is, of tables, column families, qualifiers, and so on).

Installation

Next, I’ll show you some frequently used commands and use cases. But first, install the current version of Starbase from CheeseShop (PyPi).

 

Do required imports:

 

…and create a connection instance. Starbase defaults to 127.0.0.1:8000; if your settings are different, specify them here.

 

Use Cases and Examples

Show Tables

Assuming that there are two existing tables named table1 and table2, the following would be printed out.

 

Table Schema Operations

Whenever you need to operate with a table, you need to create a table instance first.

Create a table instance (note, that at this step no table is created):

 

Create a new table:

Create a table with columns ‘column1′, ‘column2′, ‘column3′ (here the table is actually created):

 

Check if table exists:

 

Show table columns:

 

Add columns to the table, given (‘column4’, ‘column5’, ‘column6’, ‘column7’):

 

Drop columns from table, given (‘column6’, ‘column7’):

 

Drop entire table schema:

 

Table Data Operations

Insert data into a single row:

 

Note that you may also use the “native” means of naming the columns and cells (qualifiers). The result of the following would be equal to the result of the previous example.

 

Update row data:

 

Remove a row cell (qualifier):

 

Remove a row column (column family):

 

Remove an entire row:

 

Fetch a single row with all columns:

 

Fetch a single row with selected columns (limit to ‘column1′ and ‘column2′ columns):

 

Narrow the result set even more (limit to cells ‘key1′ and ‘key2′ of column column1 and cell ‘key32′ of column ‘column3′):

 

Note that you may also use the native means of naming the columns and cells (qualifiers). The example below does exactly the same thing as the example above.

 

If you set the perfect_dict argument to False, you’ll get the native data structure:

 

Batch Operations with Table Data

Batch operations (insert and update) work similarly to routine insert and update, but are done in a batch. You are advised to operate in batch as much as possible.

In the example below, we will insert 5,000 records in a batch:  

 

In the example below, we will update 5,000 records in a batch:

 

Note: The table batch method accepts an optional size argument (int). If set, an auto-commit is fired each the time the stack is full.

Table Data Search (Row Scanning)

A table scanning feature is in development. At the moment it’s only possible to fetch all rows from a table. The result set returned is a generator.

 

Conclusion

I hope you learned a little about Starbase here and will put it to good use. You are welcome to report any issues via the project’s issue tracker.

Editor’s note: This post should not be taken as an indication that Starbase is recommended for production or will be supported in CDH. We just thought you might be interested.

Filed under:

3 Responses
  • Paul Eddie / December 03, 2013 / 12:07 PM

    Firstly, is there a way to insert a binary file? I would like to store tiff files in HBase. Secondly, will there be a way to retrieve it via REST?

  • Artur Barseghyan / December 05, 2013 / 1:15 PM

    Hey Paul,

    Yes, you can.

    Check test method test_25_insert_binary_file near line 1052 in the (https://github.com/barseghyanartur/starbase/blob/master/src/starbase/client/tests.py).

    What basically happens there, is that you first download a file (JPG image) from internet, read its’ contents and then write into the HBase table row. Then, you fetch the binary data you have just inserted and compare it to the original one. It matches. I even wrote the file contents fetched from HBase into a JPG file and then opened it. All went well.

    I hope it helps.

    Best regards,

  • Wouter Bolsterlee / February 01, 2014 / 11:51 AM

    An alternative, faster and very feature rich library to access HBase from Python is HappyBase (https://happybase.readthedocs.org/). It does not use the Stargate REST server, but the Thrift server included with HBase.

Leave a comment


six × 4 =