Today the Apache HBase community has proudly released Apache HBase 0.92.0, a major new version of the scalable distributed data store inspired by Google’s BigTable. Over 670 issues were addressed, so in this post I’ll highlight some of the major features and enhancements and describe what they mean for HBase users, admins, and developers.
While the most visible change to the project is the new project logo, the most important changes for users are the performance and robustness improvements to HBase’s core functionality. On the performance side, there are a few major highlights:
- HFile v2, a new more efficient storage format
- Faster recovery via distributed log splitting
- Lower latency region-server operations via new multi-threaded and asynchronous implementations.
HFile v2 is a series of patches that change the internal format that HBase uses to store region data on HDFS. The key innovation is the implementation of multi-level file indexes. This improves the general performance of HBase reads by making I/O more granular. The mechanism reduces latency spikes and reduces the memory footprint required when HBase’s internal data files are loaded. This also frees memory for more caching and allows for significantly larger region files.
Although HBase is fault tolerant, some recovery scenarios take significant time. One recovery procedure, log splitting, occurs when regions are reassigned after a region server goes down. In HBase 0.90.x, the write-ahead logs (HLogs) from each downed region server would be processed serially by a single master. HBase 0.92.0 parallelizes this HLog processing by performing distributed log splitting concurrently on several region servers instead of on a single machine. The resulting speedup reduces recovery times and thus significantly reduces region unavailability time if many region servers fail simultaneously or if a cluster is restarted ungracefully.
Finally, several operations such as compaction, table creation, and bulk loading have new multi-threaded or asynchronous implementations. By using thread pooling and pipelining tasks, many of these occasional operations can happen concurrently and complete more quickly.
HBase 0.92.0 improves supportability and simplifies operations by introducing features that aid in diagnosing HBase’s state and aid in repairing a corrupted HBase. If HBase gets slow or seems stuck, the HBase 0.90.x releases unfortunately require shell access and developers to diagnose even simple problems. In more serious cases, expert HBase developers would be required for diagnosis, repair, and recovery. HBase 0.92.0 exposes new features that help operators pinpoint or repair problems more quickly. Highlights include:
- An enhanced web UI that exposes more internal state
- Improved logging for identifying slow queries
- Improved corruption detection and repair tools
HBase 0.92.0’s enhanced master UI provides more detailed state information which can hasten problem diagnosis. Specifically, this version now exposes regions-in-transition states, and also includes a centralized cluster debug page that displays detailed cluster state information in a single web page. This can help identify problems and pinpoint which machines to focus on.
Sometimes a slow machine is an indication that a machine is on the verge of failing. HBase 0.92.0’s region servers now provides a ‘show processlist’-like JSON encoded thread dump on its web interface. It was also augmented to expose and log slow query metrics. This allows administrators to determine if particular operations are inefficient or to forensically correlate slow queries with region server operations such as flushes, compaction, GC’s, or splits.
Finally, HBase 0.92.0 includes improved utilities for dealing with the rare occasions where data corruption has affected the integrity and consistency of HBase’s internal data. Previously, repairs for these potentially disasterous situations required time-consuming manual analysis and intimate knowledge of HBase’s internals. HBase 0.92.0 improves hbck, a tool for detecting and fixing transient corruptions, by providing an accurate and detailed summary of region layout. This version also introduces another tool, OfflineMetaRepair, enables administrators to rebuild the META table of a severely corrupted HBase.
Last but not least, HBase 0.92.0 adds several advanced features that improves HBase’s extensibility, flexiblity, and performance. Highlights include:
- Build support for Hadoop 0.20.20x, 0.22, 0.23.
- Experimental: offheap slab cache and online table schema change
For developers, the biggest news is the coprocessor framework. Coprocessors are a powerful extension interface akin to database triggers or kernel modules. With coprocessors, custom coprocessors can be plugged-in to HBase-specific operations such as data access such as gets and puts, as well as table modification, master transitions, and HLog operations. This version includes an HBase security coprocessor that implements mechanisms for column-level access control and authorization.
During this release cycle, the Apache Hadoop folks have official releases that support hflush/sync/append. The new HBase versions now build against Hadoop 1.0.0 and have additions to build and test against the 0.22 and 0.23 beta versions of Hadoop as well.
Finally, I’ll highlight some experimental but promising new features. An offheap slab cache implementation which provides a manual memory management mechanism that is used to avoid object creation overhead, memory fragmentation, and other woes associated with depending upon GC. Also included is an initial implementation of an online table schema change mechanism that allows for adding and removing column families without disabling a table.
HBase 0.92.0 is significant major new version and is truly a community effort with major contributions from companies including Facebook, eBay, TrendMicro, StumbleUpon, Salesforce, Explorys, and Cloudera. Although this version is not currently available in Cloudera’s Distribution including Apache Hadoop (CDH) yet, several features have already made it into CDH3 HBase updates and will be the basis of a future CDH4 release.