Operational Database in CDP

Editor’s Note, August 2020: CDP Data Center is now called CDP Private Cloud Base. You can learn more about it here.

Cloudera’s operational database (OpDB) in CDP delivers a real-time, always available, scalable OpDB that serves traditional structured data alongside new unstructured data within a unified Operational and Warehousing platform. Cloudera delivers an operational database that serves traditional structured data alongside new unstructured data within a unified open-source platform.

The operational database helps you to:

Operationalize machine learning/artificial intelligence to revolutionize sectors such as healthcare, public utilities, etc. 

  • Serve real-time content at webscale. 
  • Empower big data analytics for operational and offline uses.
  • Use as a resilient store of record. 

OpDB in CDP is currently available in two form factors: as a fully secure, semi-managed offering in CDP Public Cloud – Data Hub, and as a fully customizable offering in CDP Data Center – similar to what is already available in CDH and HDP. You can pick a form factor based on your deployment strategy and OpDB needs. The operational database uses an object store such as Amazon S3  as a storage layer for Apache HBase, where HFiles are written to the object store, but WALs are written to HDFS. 

The operational database in CDP has the following components: 

  • Apache Phoenix a SQL interface that runs on top of Apache HBase. 
  • Apache HBase is designed for massive scalability, so you can store unlimited amounts of data in a single platform and handle growing demands for serving data.
  • Apache ZooKeeper provides a distributed configuration service, a synchronization service, and a naming registry.
  • Apache Knox Gateway provides perimeter security so that the enterprise can confidently extend access to new users.
  • Apache HDFS is used to write the Apache HBase WALs.
  • Object store such as Amazon S3 and Microsoft ADLS Gen2 is used to store the Apache HBase HFiles.
  • Shared Data Experience (SDX) is used for security and governance capabilities. Security and governance policies are set once and applied across all data and workloads.
  • IDBroker is a REST API built as part of Apache Knox’s authentication services. It allows an authenticated and authorized user to exchange a set of credentials or a token for cloud vendor access tokens.

Operational Database in CDP Series 

This article gave you an introduction about OpDB on CDP and its architecture. You can learn more about each aspect of the OpDB, and find out about the new features and capabilities of OpDB in the upcoming articles of this series. We will update links to each blog post as they are published. 

Accessibility

Cloudera’s OpDB ensures that users can access or retrieve stored data. It supports both auto-sharding and pre-defined sharding, three query engines and several data integration tools. This article provides an overview of these capabilities and other features that ensure a high level of accessibility is achieved. 

Administration

Cloudera’s OpDB provides several administration tools and features to administer your OpDB workload. Administrators can deploy OpDB as a fully secure, semi-managed offering in CDP Public Cloud – Data Hub and as a fully customizable offering in CDP Data Center (similar to what is available in CDH and HDP). This article provides you with a high-level overview of what features and tools are supported to administer OpDB in CDP. 

Management

Cloudera’s OpDB provides management tools that help you effectively manage your OpDB workloads. This article gives you an overview of the OpDB management tools and features in CDP.

Availability

Cloudera’s OpDB maintains a high level of data availability, ensuring the required data is accessible when and where needed, even if node failure occurs. This article provides an overview of features that make a high level of availability possible, such as out of the box High Availability, Data Replication and Error Protection.

Integrity

Cloudera’s OpDB provides various data integrity capabilities including entity and domain integrity, ACID transactions, and nonrelational integrity. This article provides an overview of the OpDB data integrity capabilities. 

Application Support

Cloudera’s OpDB supports various popular languages, frameworks, and applications that you can use to access data stored in your OpDB. This article gives you an overview of the supported languages, frameworks, and applications.

NoSQL and Related Capabilities

As Cloudera’s OpDB includes the NoSQL database HBase to store data, it has NoSQL capabilities, such as key values, table-style capabilities, and flexible data types. Tight integration across the Hadoop ecosystem is also provided, including HDFS, Spark, and Kafka. This article provides an overview of these capabilities. 

Scalability

Cloudera’s OpDB is designed for high scalability, supporting both vertical and horizontal scalability with flexible data type and no limit on data size. This article provides an overview of supported scalability related features and tools. 

Security

Cloudera’s OpDB provides multiple security solutions at different levels, regarding encryption, authentication, authorization, and audit. This article provides an overview of these security-related features and tools.

For more information, please go to: Getting Started with Operational Database

Gokul Kamaraj
More by this author
Krishna Maheshwari
Director of Product Management
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.