Tag Archives: HFile

Inside Apache HBase’s New Support for MOBs

Categories: HBase

Learn about the design decisions behind HBase’s new support for MOBs.

Apache HBase is a distributed, scalable, performant, consistent key value database that can store a variety of binary data types. It excels at storing many relatively small values (<10K), and providing low-latency reads and writes.

However, there is a growing demand for storing documents, images, and other moderate objects (MOBs)  in HBase while maintaining low latency for reads and writes.

Read more

Tuning Java Garbage Collection for HBase

Categories: Guest HBase Performance

This guest post from Intel Java performance architect Eric Kaczmarek (originally published here) explores how to tune Java garbage collection (GC) for Apache HBase focusing on 100% YCSB reads.

Apache HBase is an Apache open source project offering NoSQL data storage. Often used together with HDFS, HBase is widely used across the world. Well-known users include Facebook, Twitter, Yahoo, and more. From the developer’s perspective, HBase is a “distributed,

Read more

What are HBase Compactions?

Categories: HBase

The compactions model is changing drastically with CDH 5/HBase 0.96. Here’s what you need to know.

Apache HBase is a distributed data store based upon a log-structured merge tree, so optimal read performance would come from having only one file per store (Column Family). However, that ideal isn’t possible during periods of heavy incoming writes. Instead, HBase will try to combine HFiles to reduce the maximum number of disk seeks needed for a read.

Read more

How-to: Use HBase Bulk Loading, and Why

Categories: HBase How-to

Apache HBase is all about giving you random, real-time, read/write access to your Big Data, but how do you efficiently get that data into HBase in the first place? Intuitively, a new user will try to do that via the client APIs or by using a MapReduce job with TableOutputFormat, but those approaches are problematic, as you will learn below. Instead, the HBase bulk loading feature is much easier to use and can insert the same amount of data more quickly.

Read more

Introduction to Apache HBase Snapshots, Part 2: Deeper Dive

Categories: HBase

In Part 1 of this series about Apache HBase snapshots, you learned how to use the new Snapshots feature and a bit of theory behind the implementation. Now, it’s time to dive into the technical details a bit more deeply.

What is a Table?

An HBase table comprises a set of metadata information and a set of key/value pairs:

  • Table Info: A manifest file that describes the table “settings”,

Read more