Cloudera Replication Plugin enables x-platform replication for Apache HBase

The Cloudera Data Platform (CDP) is the latest Big Data offering from Cloudera. It includes Apache HBase and Phoenix as part of the platform.  These two components are provided in 3 form-factors:

  1. For on-prem deployments, they are available in a manner similar to CDH & HDP (within the CDP Private Cloud offering)
  2. For customers that want to manage the database on their own in AWS & Azure, it is available as part of the CDP Public Cloud DataHub offering (with the Operational Database template or in Custom DataHub deployments)
  3. It will be available as part of the Cloudera Operational Database (COD) in the coming soon which is a fully managed offering eliminating the management overhead of operating an HBase deployment

Cloudera’s Apache HBase customers typically run mission-critical applications that cannot afford any downtime. They need a way to migrate to a new deployment either without a production outage or, at a minimum, a tiny outage. With these upgrade considerations in mind, especially with the upcoming end of support for CDH5 and HDP 2, we have developed the Cloudera OpDB Replication Plugin.  

Many companies also deploy CDH 6, HDP 3, and EMR based HBase clusters but are looking to reduce or eliminate the operational overhead of maintaining HBase clusters.  For them, the Cloudera OpDB Replication Plugin can enable them to migrate to DataHub or COD without incurring any downtime or production outage.

The Replication Plugin supports replication from the following source HBase clusters:

  • CDH 5.14
  • CDH 6.3
  • HDP 2.6.5
  • HDP 3.1.5
  • EMR 5.28

HBase replication

HBase has provided a mature, feature rich replication capability for nearly a decade. Replication is one of HBase’s most popular capabilities as it provides an automatic disaster-recovery (DR) solution, supports data migration, supports workload partitioning and/or supports a search-based secondary index through integration with Apache Solr.  A detailed discussion of how HBase Replication works and how to configure replication is explained in the HBase Reference Guide and has been discussed in many Cloudera Blog articles.  Today, it supports many topologies including:

  • Fan-in 
  • Fan-out
  • Cyclic
  • Bi-directional

HBase replication can be configured at either the namespace (i.e., database) or table level.  While near-real-time in nature, it can be configured to be eventually consistent or timeline consistent.  

The Cloudera OpDB Replication Plugin only supports a destination cluster provided by a CDP DataHub Cluster or by a COD database, deployed in either AWS or Azure.

Establishing trust

HBase replication to date has required that all participating clusters have the same security definitions, in other words, all clusters must either have no security enabled (authentication configuration set to simple), or all clusters must have security enabled with kerberos (authentication configuration set to kerberos).  

When Kerberos is used, all clusters’ kerberos principals must belong to the same realm, or if in different realms, those must be trustable between each other (commonly known as cross-realm authentication). 

Configuring cross-realm trust with Kerberos is problematic in most organizations as corporate security policies typically forbid it.  To address this issue, the Cloudera OpDB Replication plugin extends HBase replication to use an alternative authentication method, enabling replication across security domains. The Replication Plugin allows replication 

  • Across multiple Kerberos domains without requiring cross-realm trust
  • Replication from secure to insecure clusters, and 
  • Replication from insecure to secure clusters.

To establish trust from CDP clusters for clusters that have either no security configurations or are secured using Kerberos, the Replication Plugin implements a new authentication mechanism using a shared secret which is created using a provided tool and stored in both the source and destination clusters.

Conclusion

Replication is a valuable tool for implementing DR and data-center(DC) migration solutions for HBase. It has some caveats, as shown here when dealing with clusters’ security configurations. With the impending end of life of CDH 5 and HDP 2, the ability to migrate data from these legacy platforms to CDP is imperative.

For customers with HDP3, CDH6, and EMR 5.28 based HBase deployments, this plugin enables these customers to seamlessly adopt a fully managed HBase solution and drastically reduce the operational overhead of managing HBase.

Reach out to your Cloudera account team if you are interested in deploying the Cloudera OpDB Replication Plugin in your environment.

Krishna Maheshwari
Director of Product Management
More by this author
Wellington Chevreuil
More by this author
Josh Elser
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.