How-to: Test HBase Applications Using Popular Tools

Categories: HBase Testing

While Apache HBase adoption for building end-user applications has skyrocketed, many of those applications (and many apps generally) have not been well-tested. In this post, you’ll learn some of the ways this testing can easily be done.

We will start with unit testing via JUnit, then move on to using Mockito and Apache MRUnit, and then to using an HBase mini-cluster for integration testing. (The HBase codebase itself is tested via a mini-cluster, so why not tap into that for upstream applications, as well?)

As a basis for discussion, let’s assume you have an HBase data access object (DAO) that does the following insert into HBase. The logic could be more complicated of course but for the sake of example, this does the job.


HBaseTestObj is a basic data object with getters and setters for rowkey, data1, and data2.

The insertRecord does an insert into the HBase table against the column family of CF, with CQ-1 and CQ-2 as qualifiers. The createPut method simply populates a Put and returns it to the calling method. 

Using JUnit

JUnit, which is well known to most Java developers at this point, is easily applied to many HBase applications. First, add the dependency to your pom:


Now, within the test class:


What you did here was to ensure that your createPut method creates, populates, and returns a Put object with expected values.

Using Mockito

So how do you go about unit testing the above insertRecord method? One very effective approach is to do so with Mockito.

First, add Mockito as a dependency to your pom:


Then, in test class:


Here you have populated HBaseTestObj with “ROWKEY-1”, “DATA-1”, “DATA-2” as values. You then used the mocked table and the DAO to insert the record. You captured the Put that the DAO would have inserted and verified that the rowkey, data1, and data2 are what you expect them to be.

The key here is to manage htable pool and htable instance creation outside the DAO. This allows you to mock them cleanly and test Puts as shown above. Similarly, you can now expand into all the other operations such as Get, Scan, Delete, and so on.

Using MRUnit

With regular data access unit testing covered, let’s turn toward MapReduce jobs that go against HBase tables.

Testing MR jobs that go against HBase is as straightforward as testing regular MapReduce jobs. MRUnit makes it really easy to test MapReduce jobs including the HBase ones.

Imagine you have an MR job that writes to an HBase table, “MyTest”, which has one column family, “CF”. The reducer of such a job could look like:


Now how do you go about unit-testing the above reducer in MRUnit? First, add MRUnit as a dependency to your pom.



Then, within the test class, use the ReduceDriver that MRUnit provides as below:


Basically, after a bunch of processing in MyReducer, you verified that:


  • The output is what you expect.
  • The Put that is inserted in HBase has “RowKey-1” as the rowkey.
  • “DATADATA1DATA2” is the value for the CF column family and CQ column qualifier.


You can also test Mappers that get data from HBase in a similar manner using MapperDriver, or test MR jobs that read from HBase, process data, and write to HDFS.

Using an HBase Mini-cluster

Now we’ll look at how to go about integration testing. HBase ships with HBaseTestingUtility, which makes writing integration testing with an HBase mini-cluster straightforward. In order to pull in the correct libraries, the following dependencies are required in your pom:


Now, let’s look at how to run through an integration test for the MyDAO insert described in the introduction:


Here you created an HBase mini-cluster and started it. You then created a table called “MyTest” with one column family, “CF”. You inserted a record using the DAO you needed to test, did a Get from the same table, and verified that the DAO inserted records correctly.

The same could be done for much more complicated use cases along with the MR jobs like the ones shown above. You can also access the HDFS and ZooKeeper mini-clusters created while creating the HBase one, run an MR job, output that to HBase, and verify the inserted records.

Just a quick note of caution: starting up a mini-cluster takes 20 to 30 seconds and cannot be done on Windows without Cygwin. However, because they should only be run periodically, the longer run time should be acceptable. 

You can find sample code for the above examples at Happy testing!

Sunil Sitaula is a Solutions Architect for Cloudera.



One response on “How-to: Test HBase Applications Using Popular Tools