Hello, Starbase: A Python Wrapper for the HBase REST API

The following guest post is provided by Artur Barseghyan, a web developer currently employed by Goldmund, Wyldebeast & Wunderliebe in The Netherlands.

Python is my personal (and primary) programming language of choice and also happens to be the primary programming language at my company. So, when starting to work with a new technology, I prefer to use a clean and easy (Pythonic!) API.

After studying tons of articles on the web, reading (and writing) white papers, and doing basic performance tests (sometimes hard if you’re on a tight schedule), my company recently selected Cloudera for our Big Data platform (including using Apache HBase as our data store for Apache Hadoop), with Cloudera Manager serving a role as “one console to rule them all.”

However, I was surprised shortly thereafter to learn about the absence of a working Python wrapper around the REST API for HBase (aka Stargate). I decided to write one in my free time, and the result, ladies and gentlemen, was Starbase (GPL).

In this post, I will provide some code samples and briefly explain what work has been done on Starbase. I assume that reader of this blog post already has some basic understanding of HBase (that is, of tables, column families, qualifiers, and so on).

Installation

Next, I’ll show you some frequently used commands and use cases. But first, install the current version of Starbase from CheeseShop (PyPi).

$ pip install starbase

 

Do required imports:

>>> from starbase import Connection

 

…and create a connection instance. Starbase defaults to 127.0.0.1:8000; if your settings are different, specify them here.

>>> c = Connection()

 

Use Cases and Examples

Show Tables

Assuming that there are two existing tables named table1 and table2, the following would be printed out.

>>> c.tables()
['table1', 'table2']

 

Table Schema Operations

Whenever you need to operate with a table, you need to create a table instance first.

Create a table instance (note, that at this step no table is created):

>>> t = c.table('table3')

 

Create a new table:

Create a table with columns ‘column1′, ‘column2′, ‘column3′ (here the table is actually created):

>>> t.create('column1', 'column2', 'column3')
201

 

Check if table exists:

>>> t.exists()
True

 

Show table columns:

>>> t.columns()
['column1', 'column2', 'column3']

 

Add columns to the table, given (‘column4’, ‘column5’, ‘column6’, ‘column7’):

>>> t.add_columns('column4', 'column5', 'column6', 'column7')
200

 

Drop columns from table, given (‘column6’, ‘column7’):

>>> t.drop_columns('column6', 'column7')
201

 

Drop entire table schema:

>>> t.drop()
200

 

Table Data Operations

Insert data into a single row:

>>> t.insert(
>>>     'my-key-1',
>>>     {
>>>         'column1': {'key11': 'value 11', 'key12': 'value 12', 'key13': 'value 13'},
>>>         'column2': {'key21': 'value 21', 'key22': 'value 22'},
>>>         'column3': {'key32': 'value 31', 'key32': 'value 32'}
>>>     }
>>> )
200

 

Note that you may also use the “native” means of naming the columns and cells (qualifiers). The result of the following would be equal to the result of the previous example.

>>> t.insert(
>>>     'my-key-1a',
>>>     {
>>>         'column1:key11': 'value 11', 'column1:key12': 'value 12', 'column1:key13': 'value 13',
>>>         'column2:key21': 'value 21', 'column2:key22': 'value 22',
>>>         'column3:key32': 'value 31', 'column3:key32': 'value 32'
>>>     }
>>> )
200

 

Update row data:

>>> t.update(
>>>     'my-key-1',
>>>     {'column4': {'key41': 'value 41', 'key42': 'value 42'}}
>>> )
200

 

Remove a row cell (qualifier):

>>> t.remove('my-key-1', 'column4', 'key41')
200

 

Remove a row column (column family):

>>> t.remove('my-key-1', 'column4')
200

 

Remove an entire row:

>>> t.remove('my-key-1')
200

 

Fetch a single row with all columns:

>>> t.fetch('my-key-1')
  {
      'column1': {'key11': 'value 11', 'key12': 'value 12', 'key13': 'value 13'},
      'column2': {'key21': 'value 21', 'key22': 'value 22'},
      'column3': {'key32': 'value 31', 'key32': 'value 32'}
  }

 

Fetch a single row with selected columns (limit to ‘column1′ and ‘column2′ columns):

>>> t.fetch('my-key-1', ['column1', 'column2'])
  {
      'column1': {'key11': 'value 11', 'key12': 'value 12', 'key13': 'value 13'},
      'column2': {'key21': 'value 21', 'key22': 'value 22'},
  }

 

Narrow the result set even more (limit to cells ‘key1′ and ‘key2′ of column `column1` and cell ‘key32′ of column ‘column3′):

>>> t.fetch('my-key-1', {'column1': ['key11', 'key13'], 'column3': ['key32']})
  {
      'column1': {'key11': 'value 11', 'key13': 'value 13'},
      'column3': {'key32': 'value 32'}
  }

 

Note that you may also use the native means of naming the columns and cells (qualifiers). The example below does exactly the same thing as the example above.

>>>  t.fetch('my-key-1', ['column1:key11', 'column1:key13', 'column3:key32'])
  {
      'column1': {'key11': 'value 11', 'key13': 'value 13'},
      'column3': {'key32': 'value 32'}
  }

 

If you set the perfect_dict argument to False, you’ll get the native data structure:

>>>  t.fetch('my-key-1', ['column1:key11', 'column1:key13', 'column3:key32'], perfect_dict=False)
{
    'column1:key11': 'value 11', 'column1:key13': 'value 13',
    'column3:key32': 'value 32'
}

 

Batch Operations with Table Data

Batch operations (insert and update) work similarly to routine insert and update, but are done in a batch. You are advised to operate in batch as much as possible.

In the example below, we will insert 5,000 records in a batch:  

>>> data = {
>>>     'column1': {'key11': 'value 11', 'key12': 'value 12', 'key13': 'value 13'},
>>>     'column2': {'key21': 'value 21', 'key22': 'value 22'},
>>> }
>>> b = t.batch()
>>> for i in range(0, 5000):
>>>     b.insert('my-key-%s' % i, data)
>>> b.commit(finalize=True)
{'method': 'PUT', 'response': [200], 'url': 'table3/bXkta2V5LTA='}

 

In the example below, we will update 5,000 records in a batch:

>>> data = {
>>>     'column3': {'key31': 'value 31', 'key32': 'value 32'},
>>> }
>>> b = t.batch()
>>> for i in range(0, 5000):
>>>     b.update('my-key-%s' % i, data)
>>> b.commit(finalize=True)
{'method': 'POST', 'response': [200], 'url': 'table3/bXkta2V5LTA='}

 

Note: The table batch method accepts an optional size argument (int). If set, an auto-commit is fired each the time the stack is full.

Table Data Search (Row Scanning)

A table scanning feature is in development. At the moment it’s only possible to fetch all rows from a table. The result set returned is a generator.

>>> t.fetch_all_rows()

 

Conclusion

I hope you learned a little about Starbase here and will put it to good use. You are welcome to report any issues via the project’s issue tracker.

Editor’s note: This post should not be taken as an indication that Starbase is recommended for production or will be supported in CDH. We just thought you might be interested.

Filed under:

3 Responses
  • Paul Eddie / December 03, 2013 / 12:07 PM

    Firstly, is there a way to insert a binary file? I would like to store tiff files in HBase. Secondly, will there be a way to retrieve it via REST?

  • Artur Barseghyan / December 05, 2013 / 1:15 PM

    Hey Paul,

    Yes, you can.

    Check test method `test_25_insert_binary_file` near line 1052 in the (https://github.com/barseghyanartur/starbase/blob/master/src/starbase/client/tests.py).

    What basically happens there, is that you first download a file (JPG image) from internet, read its’ contents and then write into the HBase table row. Then, you fetch the binary data you have just inserted and compare it to the original one. It matches. I even wrote the file contents fetched from HBase into a JPG file and then opened it. All went well.

    I hope it helps.

    Best regards,

  • Wouter Bolsterlee / February 01, 2014 / 11:51 AM

    An alternative, faster and very feature rich library to access HBase from Python is HappyBase (https://happybase.readthedocs.org/). It does not use the Stargate REST server, but the Thrift server included with HBase.

Leave a comment


4 − three =