Introducing Hannibal: A Tool for Apache HBase Region Monitoring

The following is a guest post from Nils Kübler, the creator of the Hannibal project. He is software engineer at Sentric, a Swiss big data specialist, providing consultancy, development and training.

Hannibal aims to help Apache HBase administrators monitor the cluster in terms of region distribution and is basically a decision-making aid for manual splitting. It widens the monitoring capabilities of HBase by providing different views with interactive graphs of the cluster. Hannibal is also a Web-based tool that fits smoothly into your existing Hadoop/HBase ecosystem.

Hannibal is open source (MIT License) and implemented in Scala. In its current version it supports HBase 0.90. Support for versions > 0.90 is planned and will be added soon.

The Joy of Splitting

A “region” is the basic unit of data distribution and balancing in HBase. The proper region size (and quantity) has a direct impact on the overall system performance. It is therefore vital for any production cluster to monitor the region growth and distribution over time respectively.

Manually splitting Apache HBase regions has some advantages over managed splitting and is a widely used practice in the industry. Possible advantages include:

  • Much easier debugging and profiling of region log files
  • Shifting of the region split to off-peak hours
  • Prevention of region hot-spotting, at least if the row-key design allows it
  • Prevention of a compaction storm (large disk I/O and network traffic) when having roughly uniform data distribution and growth

Now, let’s take a closer look at the Hannibal UI.

Region Distribution

Hannibal’s main page shows a graph with the distribution of the regions over the cluster. It’s a bar-chart showing how much space is assigned on each RegionServer. Each bar is also separated into multiple colors for the tables. Hovering over those parts reveals more information such as the number of regions on that server.

 

This view can give first hints whether the distribution of the tables is ideal or not.

Region Splits per Table

Hannibal’s table view shows a graph for all regions of a table, ordered by the size. On an optimal table with evenly distributed regions, every bar should be about the same size. There is also a red line which shows the configured hbase.hregion.max.filesize, which, depending on your configuration, may help you to decide when a region should be split or not.

This graph can show you which regions you should split or merge next.

Region History

Hannibal also allows you to get deeper information for each region. Therefore Hannibal records different metrics. Right now the recorded metrics are:

  • Number of storefiles
  • Size of the memstore
  • Size of the storefiles
  • Compactions

This information can also help you make decisions like whether the region should be split.

The Graph reveals details and problems on your region.

More Information

Please have a look at the readme on GitHub. You can find links there to the source code, documentation and installation information. There is also a video tutorial available.

We encourage HBase developers and administrators to try Hannibal out. Let us know what you think, what you like, what you don’t, or what additional features you would like to see.

Filed under:

2 Responses
  • Otis Gospodnetic / November 26, 2012 / 9:03 PM

    Hannibal looks nice.
    And is similar to SPM for HBase: http://sematext.com/spm/hbase-performance-monitoring/index.html

  • Nils Kübler / November 28, 2012 / 6:00 AM

    @Otis Gospodnetic

    We knew about SPM and our intention was not to compete with it. The focus of Hannibal is to provide views of the cluster that help make decisions about when and where to split the regions – nothing else. SPM provides an overall monitoring solution, but doesn’t seem to provide the graphs we needed for this.

Leave a comment


− eight = 1