As you may know, Apache HBase has a vibrant community and gets a lot of contributions from developers worldwide. The collaborative development effort is so active, in fact, that a new point-release comes out about every six weeks (with the current stable branch being 0.94).
At Cloudera, we’re committed to ensuring that CDH, our open source distribution of Apache Hadoop and related projects (including HBase), ships with the results of this steady progress.
This is the second blogpost about Apache HBase replication. The previous blogpost, HBase Replication Overview, discussed use cases, architecture and different modes supported in HBase replication. This blogpost is from an operational perspective and will touch upon HBase replication configuration, and key concepts for using it — such as bootstrapping, schema change, and fault tolerance.
As mentioned in HBase Replication Overview, the master cluster sends shipment of WALEdits to one or more slave clusters.
Apache HBase Replication is a way of copying data from one HBase cluster to a different and possibly distant HBase cluster. It works on the principle that the transactions from the originating cluster are pushed to another cluster. In HBase jargon, the cluster doing the push is called the master, and the one receiving the transactions is called the slave. This push of transactions is done asynchronously, and these transactions are batched in a configurable size (default is 64MB).
Apache HBase 0.94.0 has been released! This is the first major release since the January 22nd HBase 0.92 release. In the HBase 0.94.0 release the main focuses were on performance enhancements and the addition of new features (Also, several major bug fixes).
Performance Related JIRAs
Below are a few of the important performance related JIRAs:
- Read Caching improvements: HDFS stores data in one block file and its corresponding metadata (checksum) in another block file.