Operational Database Management

This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP

This blog post gives you an overview of the OpDB management tools and features in the Cloudera Data Platform. The tools discussed in this article will help you understand the various options available to manage the operations of your OpDB cluster.

Backup and recovery tools

Cloudera provides multiple mechanisms to allow backup and recovery, including:

  • Snapshots
  • Replication
  • Export
  • CopyTable
  • HTable API
  • Offline backup of HDFS data

These can be run manually or scheduled using Replication Manager. Backups can also be moved to other instances of the OpDB or alternate storage targets such as AWS S3 or Azure ADLS gen 2.

Management tools 

You can use these management tools to manage and automate operations:

  • Cloudera Manager can automate your normal day-to-day operations such as monitoring performance and identifying issues. You can use Cloudera Manager to configure parameters that can help tune your OpDB. The changes can be applied using the directly exposed parameters or using the safety-valve options. Examples like setting GC pauses, request handler thread counts can help tune your OpDB. Cloudera Manager also provides metrics and built-in workload graphs to help analyze the change in tuning parameters.

  • HBCK2 tool can automate recovery due to outages or operational issues. You can use HBCK2 to repair Apache HBase in OpDB. For more information about HBCK2 tool, see Run HBCK manually
  • hbtop provides a real-time view of  metrics per Region/Namespace/Table/RegionServer. For more information, see hbtop overview

Manual tools for data definition language (DDL) and data control language (DCL)

You can use DDL and DCL commands to perform various tasks. Cloudera’s Operational Database provides you various tools such as Apache HBase shell, Java API or SQL to run DDL and DCL based commands. 

OpDB also allows tools that connect using JDBC/ODBC, therefore, making it versatile. 

Storage management

You can make use of two different storage scenarios in CDP: 

  • You can use Amazon S3 as a storage layer where HFiles are written to S3, but WALs are written to HDFS.
  • You can use HDFS as a storage layer where both HFiles and WALs are written to HDFS. 

For more information, see HBase Object Store Semantics.

Integrated file system management

Use the File Browser feature in Cloudera Manager to browse your Apache HBase files stored on HDFS, and use any S3 file browser to browse Apache HBase files stored on S3. While Apache HBase is fully integrated with HDFS, you can also use Amazon S3 as a storage layer for Apache HBase in a scenario where HFiles are written to S3, but WALs are written to HDFS. 

Hue includes an app that allows you to interact with Apache HBase. You can use the HUE HBase Browser to access your files from any HUE supported web browser. 

Conclusion

In this blog post, we looked at how you can make use of the OpDB management capabilities in CDP. In the next article, we’ll cover how you can configure high-availability in OpDB that will help you plan for disaster recovery.

Gokul Kamaraj
More by this author
Liliana Kadar
More by this author
Krishna Maheshwari
Director of Product Management
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.