Introduction
The Apache HBase Medium Object Storage (MOB) feature was introduced by HBASE-11339. This feature improves low latency read and write access for moderately-sized values (ideally from 100K to 10MB based on our testing results), making it well-suited for storing documents, images, and other moderately-sized objects [1]. The Apache HBase MOB feature achieves this improvement by separating IO paths for file references and MOB objects, applying different compaction policies to MOBs and thus reducing write amplification created by HBase’s compactions. The MOB objects are stored in a special region, called the MOB region. MOB objects for one table are stored in the MOB region as MOB files, which means that there will be lots of MOB files in this region. Please see Figure 1 from [1] for Apache HBase MOB architecture.
Initially, MOB files are relatively small (less than 1 or 2 HDFS blocks). To improve Apache HDFS efficiency, MOB files are periodically merged into larger files via an operation called MOB compaction, which is independent of the normal compaction process. The initial version of MOB compaction rewrites the multiple MOB files from a particular day into larger MOB files for that day. Let’s use the example file listing below to make this clearer. Table t1 has two regions (r1, r2), it has one column family (f1), and MOB enabled. You can see that there are two prefixes; D279186428a75016b17e4df5ea43d080 corresponds to the hash value of the start key for region r1 and D41d8cd98f00b204e9800998ecf8427e to the hash value of the start key for region r2. For region r1, there are two MOB files each on 1/1/2016 and 1/2/2016 , and for region r2, there are 3 MOB files on 1/1/2016 under MOB region, which is /hbase/data/mobdir/data/default/t1/78e317a6e78a0fceb27b9fa0cb9dcf5b/f1.
>ls /hbase/data/mobdir/data/default/t1/78e317a6e78a0fceb27b9fa0cb9dcf5b/f1
D279186428a75016b17e4df5ea43d08020160101f9d9713ab2fb4a8b825485f6a8acfcd5
D279186428a75016b17e4df5ea43d08020160101af7713ab2fbf4a8abc5135f6a8467ca8
D279186428a75016b17e4df5ea43d080201601029013ab2fceda8b825485f6a8acfcd515
D279186428a75016b17e4df5ea43d080201601029a7978013ab2fceda8b825485f6a8acf
D41d8cd98f00b204e9800998ecf8427e20160101fc94af623c2345f1b241887721e32a48
D41d8cd98f00b204e9800998ecf8427e20160101d0954af623c2345f1b241887721e3259
D41d8cd98f00b204e9800998ecf8427e20160101439adf4af623c2345f1b241887721e32
After MOB compaction, two MOB files on 1/1/2016 and 1/2/2016 for region r1 are compacted into one file for each day. Three MOB files on 1/1/2016 for region r2 are compacted into one file.
D279186428a75016b17e4df5ea43d08020160101f49a9d9713ab2fb4a8b825485f6a8acf
D279186428a75016b17e4df5ea43d08020160102bc9176d09424e49a9d9065caf9713ab2
D41d8cd98f00b204e9800998ecf8427e20160101d9cb0954af623c2345f1b241887721e3
Since only MOB files from the same day for a region can be compacted together, the minimum bound of MOB files under the single MOB region directory for one specific family in one year will be 365 x number- of- regions. With 1000 regions, in 10 years, there will be 365 x 1000 x 10, 3.65 million files after MOB compaction, and it keeps growing! Unfortunately, Apache HDFS has memory-constrained limit for the number of files under one directory [2]. After number of MOB files exceeds this HDFS limit, the MOB table is not writable anymore. The default maximum number of files under one directory for Apache HDFS is 1 million. For 1,000 regions, it will reach this limit in about 3 years. With more regions, it will reach the limit faster.
HBASE-16981 introduces weekly and monthly MOB compaction partition aggregation policies to improve this MOB file count scaling issue by factors of 7 or ~30 respectively.
Design of weekly and monthly MOB compaction partition policies (HBASE-16981)
The basic idea of HBASE-16981 is to compact MOB files in one calendar week or one calendar month into fewer, larger files. The calendar week is defined by ISO 8601, it starts on Monday and ends on Sunday. Normally, with weekly policy, after MOB compaction there will be one file per week per region; with monthly policy, after MOB compaction there will be one file per month per region. The number of MOB files under MOB region directory for one specific family in one year will be at reduced to 52 x number- of- regions with weekly policy and 12 x number- of- regions with monthly policy. This greatly reduces number of MOB files after compaction.
The initial proposed approach
When MOB compaction happens, HBase master selects and aggregates MOB files within one calendar month or one calendar week into fewer, larger files. Depending on how often MOB compaction happens, it is possible that files are compacted multiple times. As an example, let’s say the MOB compaction operation happens daily with monthly aggregation policy. On day 1, MOB compaction compacts all files for day 1 into one file. On day 2, MOB compaction compacts the file from day 1 and files from day 2 into a new file; on day 3, MOB compaction will compact file from day 2 and files from day 3 into a new file, it keeps going until the last day of the month. In this case, files from day 1 are compacted more than 30 times and thus amplifies the amount of write IO by greater than 30x.
The design goal of Apache HBase MOB is to reduce write amplification created by MOB compaction. This naive approach defeats the design goal.
The final implemented approach
In order to overcome the deficiency of the initial proposed approach, staged MOB compaction is adopted for new weekly and monthly policies in HBASE-16981. Figure 2 shows how it works with monthly policy, it works similarly for weekly policy.
As Figure 2 shows, MOB compaction happens on 11/15/2016. Files in the current calendar week are compacted based on daily partition with configured MOB threshold. In Figure 2, files for 11/14/2016 are compacted together, and files for 11/15/2016 are compacted together. Files in the past calendar weeks of the current month are compacted based on weekly partition with weekly threshold (configured-MOB-threshold x 7). In Figure 2, files from 11/1/2016 to 11/6/2016 are compacted together and files from 11/7/2016 to 11/13/2016 are compacted together. Files in the past months are compacted based on monthly partition with monthly threshold (configured-MOB-threshold x 28). In Figure 2, files from 10/1/2016 to 10/31/2016 are compacted together. As one may notice, the first calendar week in November 2016 is from 10/31/2016 to 11/6/2016. Since 10/31/2016 is in the past month, files for that day are compacted based on monthly partition, this leaves only 6 days for the weekly partition (11/1/2016 ~ 11/6/2016). After compaction, there are 5 files if MOB compaction threshold and MOB compaction batch size are configured appropriately.
With this design, MOB files go through 2-stage or 3-stage compactions. At each stage, daily partition, weekly partition or monthly partition are applied with increasing MOB compaction threshold. MOB Files are compacted at most 3 times normally with monthly policy and at most 2 times normally with weekly policy in their lifetime.
For more details about the design, please see [3].
Usage
By default, MOB compaction partition policy is daily. To apply weekly or monthly policy, there is a new attribute MOB_COMPACT_PARTITION_POLICY added for MOB column family. User can set this attribute when creating a table from the HBase shell.
>create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 1000000, MOB_COMPACT_PARTITION_POLICY => 'weekly’}
User can also change the existing table’s MOB_COMPACT_PARTITION_POLICY from the HBase shell.
>alter 't1', {NAME => 'f1', MOB_COMPACT_PARTITION_POLICY => 'monthly'}
If the policy changes from daily to weekly or monthly, or from weekly to monthly, the next MOB compaction will recompact MOB files which have been compacted with the previous policy. If the policy changes from monthly or weekly to daily, or from monthly to weekly, the already compacted MOB files with the previous policy will not be recompacted with the new policy.
Conclusion
HBASE-16981 solves file number scaling issue with Apache HBase MOB. It will be available in Apache HBase 2.0.0 release. CDH supports Apache HBase MOB in CDH 5.4.0+. HBASE-16981 is backported and will be available in CDH 5.11.0.
Acknowledgments
Special thanks to Jingcheng Du and Anoop Sam John for help in design and review of HBASE-16981, Jonathan Hsieh and Sean Busbey for reviewing the blog.
References
[1] https://clouderatemp.wpengine.com/blog/2015/06/inside-apache-hbases-new-support-for-mobs/
[2] https://clouderatemp.wpengine.com/blog/2009/02/the-small-files-problem/
[3] https://issues.apache.org/jira/browse/HBASE-16981