The following is a guest post from Nils Kübler, the creator of the Hannibal project. He is software engineer at Sentric, a Swiss big data specialist, providing consultancy, development and training.
Hannibal aims to help Apache HBase administrators monitor the cluster in terms of region distribution and is basically a decision-making aid for manual splitting. It widens the monitoring capabilities of HBase by providing different views with interactive graphs of the cluster. Hannibal is also a Web-based tool that fits smoothly into your existing Hadoop/HBase ecosystem.
Hannibal is open source (MIT License) and implemented in Scala. In its current version it supports HBase 0.90. Support for versions > 0.90 is planned and will be added soon.
The Joy of Splitting
A “region” is the basic unit of data distribution and balancing in HBase. The proper region size (and quantity) has a direct impact on the overall system performance. It is therefore vital for any production cluster to monitor the region growth and distribution over time respectively.
Manually splitting Apache HBase regions has some advantages over managed splitting and is a widely used practice in the industry. Possible advantages include:
- Much easier debugging and profiling of region log files
- Shifting of the region split to off-peak hours
- Prevention of region hot-spotting, at least if the row-key design allows it
- Prevention of a compaction storm (large disk I/O and network traffic) when having roughly uniform data distribution and growth
Now, let’s take a closer look at the Hannibal UI.
Hannibal’s main page shows a graph with the distribution of the regions over the cluster. It’s a bar-chart showing how much space is assigned on each RegionServer. Each bar is also separated into multiple colors for the tables. Hovering over those parts reveals more information such as the number of regions on that server.
This view can give first hints whether the distribution of the tables is ideal or not.
Region Splits per Table
Hannibal’s table view shows a graph for all regions of a table, ordered by the size. On an optimal table with evenly distributed regions, every bar should be about the same size. There is also a red line which shows the configured hbase.hregion.max.filesize, which, depending on your configuration, may help you to decide when a region should be split or not.
This graph can show you which regions you should split or merge next.
Hannibal also allows you to get deeper information for each region. Therefore Hannibal records different metrics. Right now the recorded metrics are:
- Number of storefiles
- Size of the memstore
- Size of the storefiles
This information can also help you make decisions like whether the region should be split.
The Graph reveals details and problems on your region.
We encourage HBase developers and administrators to try Hannibal out. Let us know what you think, what you like, what you don’t, or what additional features you would like to see.