How-to: Use the Apache HBase REST Interface, Part 3

by Jesse Anderson

Posted in Technical | July 08, 2013 4 min read

This how-to is the third in a series that explores the use of the Apache HBase REST interface. Part 1 covered HBase REST fundamentals, some Python caveats, and table administration. Part 2 showed you how to insert multiple rows simultaneously using XML and JSON. Part 3 below will show how to get multiple rows using XML and JSON.

Getting Rows with XML

Using a GET verb, you can retrieve a single row or a group of rows based on their row keys. (You can read more about the multiple value URL format here.) Here we are going to use the simple wildcard character or asterisk (*) to get all rows that start with a specific string. In this example, we can load every line of Shakespeare’s comedies with “shakespeare-comedies-*”. This also requires that our row key(s) be laid out by “AUTHOR-WORK-LINENUMBER”.

Here is the code for getting and working with the XML output:

request = requests.get(baseurl + "/" + tablename + "/shakespeare-comedies-*", headers={"Accept" : "text/xml"})

root = fromstring(request.text)

# Go through every row passed back
for row in root:
     message = ''
     linenumber = 0
     username = ''
    
     # Go through every cell in the row
     for cell in row:
          columnname = base64.b64decode(cell.get('column'))

          if cell.text == None:
               continue
    
          if columnname == cfname + ":" + messagecolumn:
               message = base64.b64decode(cell.text)
          elif columnname == cfname + ":" + linenumbercolumn:
               linenumber = decode(cell.text)
          elif columnname == cfname + ":" + usernamecolumn:
               username = base64.b64decode(cell.text)

     rowKey = base64.b64decode(row.get('key'))

We start off the code with a get request. This get will return all lines in Shakespeare’s comedies. These rows will come back as XML because of the change to the Accept header.

Then we take the XML returned by the request and turn it into an XML DOM. Each row from HBase is in a separate row element. We’ll use a for loop to go through every row.

Each cell in the row is a separate XML element. We’ll use another for loop to go through all of these cells. (This block of code could be made simpler by using XPath to find the correct elements.) As each column is found, the value is saved out to a variable. (The decode method is discussed in Part 1 this series.) All of the values coming back in XML are base64-encoded and need to be decoded before using them.

Finally, the row key is retrieved and decoded.

Once all of the data is found and decoded, you can start using it. Your code would start after decoding the row. Keep in mind that some of these variables don’t need to be decoded — I’m doing all of them here for the sake of completeness.

Getting Rows with JSON

Working with JSON is just like working with XML: Using a get verb, you can retrieve a single row or a group of rows based on their row key.

Here is the code for getting and working with the JSON output:

request = requests.get(baseurl + "/" + tablename + "/shakespeare-comedies-*", headers={"Accept" : "application/json"})

bleats = json.loads(request.text)

for row in bleats['Row']:
     message = ''
     lineNumber = 0
     username = ''

     for cell in row['Cell']:
          columnname = base64.b64decode(cell['column'])
          value = cell['$']
         
          if value == None:
               continue

          if columnname == cfname + ":" + messagecolumn:
               message = base64.b64decode(value)
          elif columnname == cfname + ":" + linenumbercolumn:
               lineNumber = decode(str(value))
          elif columnname == cfname + ":" + usernamecolumn:
               username = base64.b64decode(value)

     rowKey = base64.b64decode(row['key'])

We start off the code with a get request that will return all lines in Shakespeare’s comedies. These rows will come back as JSON because of the change to the Acceptheader.

Then we take the JSON returned by the request and turn it into an JSON object. Each row from HBase is in a separate index in the row array. We’ll use a for loop to go through every row.

Each cell in the row is a separate array index. We’ll use another for loop to go through all of these cells. As each column is found, the value is saved out to a variable. All of the values coming back in JSON are base64-encoded and need to be decoded before using them. (Again, the decode method is discussed in Part 1 this series.) Note that the values come back in the dollar sign ($) entry.

Finally, the row key is retrieved and decoded.

Once all of the data is found and decoded, you can start using it.

Using curl

As shown in the REST interface documentation, you can use curl to output XML or JSON directly to the console. For example, you could do the same get as we just did using curl. The command is:

curl -H "Accept: text/xml" http://localhost:8070/tablename/shakespeare-comedies-*

That command would give you the XML output. To get the JSON output, the command is:

curl -H "Accept: application/json" http://localhost:8070/tablename/shakespeare-comedies-*

With commands like these, you can quickly see what’s coming back or what the data looks like. You can use curl to see the status code of a REST call with:

[user@localhost HBaseREST]$ curl -I -H "Accept: text/xml" http://localhost:8070/messagestable/shakespeare-comedies-*
HTTP/1.1 200 OK
Content-Length: 0
Content-Type: text/xml

Conclusion

The HBase REST interface is a good way to use HBase if you don’t want to use Java. It offers you a familiar REST interface that’s built in to many languages as well as a familiar data format.

Hopefully, the code samples and explanations in this series will save you a lot of Googling when embarking on your RESTful HBase project.

Jesse Anderson is an instructor with Cloudera University.

Jesse Anderson

More by this author

Editor's Choice

Business

Generative AI for the Enterprise

Technical

Building Trust in Public Sector AI Starts with Trusting Your Data