How-to: Scan Salted Apache HBase Tables with Region-Specific Key Ranges in MapReduce

Categories: Guest HBase How-to

Thanks to Pengyu Wang, software developer at FINRA, for permission to republish this post.

Salted Apache HBase tables with pre-split is a proven effective HBase solution to provide uniform workload distribution across RegionServers and prevent hot spots during bulk writes. In this design, a row key is made with a logical key plus salt at the beginning. One way of generating salt is by calculating n (number of regions) modulo on the hash code of the logical row key (date, etc).

Salting Row Keys

For example, a table accepting data load on a daily basis might use logical row keys starting with a date, and we want to pre-split this table into 1,000 regions. In this case, we expect to generate 1,000 different salts. The salt can be generated, for example, as:

The output from hashCode() with modulo provides randomness for salt value from “000” to “999”. With this key transform, the table is pre-split on the salt boundaries as it’s created. This will make row volumes uniformly distributed while loading the HFiles with MapReduce bulkload. It guarantees that row keys with same salt fall into the same region.

In many use cases, like data archiving, you need to scan or copy the data over a particular logical key range (date range) using MapReduce job. Standard table MapReduce jobs are setup by providing the Scan instance with key range attributes.

However, setup of such a job becomes challenging for salted pre-splitted tables. Start and stop row keys will be different for each region because each has a unique salt. And we can’t specify multiple ranges to one Scan instance.

To solve this problem, we need to look into how table MapReduce works. Generally, the MapReduce framework creates one map task to read and process each input split. Each split is generated in InputFormat class base, by method getSplits().

In HBase table MapReduce job, TableInputFormat is used as InputFormat. Inside the implementation, the getSplits() method is overridden to retrieve the start and stop row keys from the Scan instance. As the start and stop row keys span across multiple regions, the range is divided by region boundaries and returns the list of TableSplit objects which covers the scan key range. Instead of being based by HDFS block, TableSplits are based on region. By overwriting the getSplits() method, we are able to control the TableSplit.

Building Custom TableInputFormat

To change the behavior of the getSplits() method, a custom class extending TableInputFormat is required. The purpose of getSplits() here is to cover the logical key range in each region, construct their row key range with their unique salt. The HTable class provides method getStartEndKeys() which returns start and end row keys for each region. From each start key, parse the corresponding salt for the region.

Job Configuration Passes Logical Key Range

TableInputFormat retrieves the start and stop key from Scan instance. Since we cannot use Scan in our MapReduce job, we could use Configuration instead to pass these two variables and only logical start and stop key is good enough (a variable could be a date or other business information). The getSplits() method has JobContext argument, The configuration instance can be read as context.getConfiguration().

In MapReduce driver:

In Custom TableInputFormat:

Reconstruct the Salted Key Range by Region

Now that we have the salt and logical start/stop key for each region, we can rebuild the actual row key range.

Creating a TableSplit for Each Region

With row key range, we can now initialize TableSplit instance for the region.

One more thing to look at is data locality. The framework uses location information in each input split to assign a map task in its local host. For our TableInputFormat, we use the method getTableRegionLocation() to retrieve the region location serving the row key.

This location is then passed to the TableSplit constructor. This will assure that the mapper processing the table split is on the same region server. One method, called DNS.reverseDns(), requires the address for the HBase name server. This attribute is stored in configuration “hbase.nameserver.address“.

A complete code of getSplits will look like this:

Use the Custom TableInoutFormat in the MapReduce Driver

Now we need to replace the TableInputFormat class with the custom build we used for table MapReduce job setup.

The approach of custom TableInputFormat provides an efficient and scalable scan capability for HBase tables which are designed to use salt for a balanced data load. Since the scan can bypass any unrelated row keys, regardless of how big the table is, the scan’s complexity is limited only to the size of the target data. In most use cases, this can guarantee relatively consistent processing time as the table grows.


7 responses on “How-to: Scan Salted Apache HBase Tables with Region-Specific Key Ranges in MapReduce

  1. Praneeth

    Thanks for the good explanation. I have a question though:

    When building the custom TableInputFormat class, you say ” ….From each start key, parse the corresponding salt for the region…”. How can we eliminate the possibility of one region having multiple salts? I understand since the salts are random they will be distributed on different regions, but it can also happen that sometimes 2 salts can go into the same region, (stored lexicographically within the region) ?


    1. Pengyu Wang

      Praneeth, thanks for your question. If you pre-split table while creating it by the salt boundary, you can guarantee that each region’s key space has unique salt. And the data will be stored in the corresponding region only. Even the auto split of the regions won’t break the uniqueness.

  2. Marko Dinic

    Could you please explain the “pre-split table while creating it by the salt boundary” part. How to do this? Could you please explain or point me to some good example? And about the “Even the auto split of the regions won’t break the uniqueness” part, is that due to the fact that even if a region is split, child regions will still have same salt?

  3. S Roy

    We are doing something similar but running into an issue.

    Our table has one column family which has a TTL of 15day. So rows fall at a consistent basis. We are seeing that the no of regions are going up. Some how the regions are not getting re-used. We are currently with over 41k regions of which 19k are empty. Any pointers as to why regions are not getting reused. Our row key design is similar to what you mentioned. 2 digit salt (0 to 63) followed by hashcode of reverse time stamp in seconds, followed by a service id and finally a counter.

    Our cluster is 41 nodes and we are writing rows at a rate of 4k to 6k Tps. The average row size is about 35kb.

  4. Michael Segel

    You never want to use a salt.
    Of course, this assumes that you know what meant by the term salt.

    A salt is a random number that is orthogonal to the rowkey.
    What most everyone who talks about a salt is pre-pending a hash or a truncated hash to the rowkey in order to get some randomness.

    Note: This means that its no longer possible to do scans() but only get() to get a specific row. You have to be careful because of this and at the same time understand that there are some use cases where ‘hotspotting’ isn’t an issue.