Operational Database Accessibility

by Liliana Kadar, Gokul Kamaraj, and Krishna Maheshwari

Posted in Technical | April 02, 2020 3 min read

This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP.

Cloudera’s OpDB provides a rich set of capabilities to store and access data. In this blog post, we’ll look at the accessibility capabilities of OpDB and how you can make use of these capabilities to access your data.

Distribution and sharding

Cloudera’s Operational Database (OpDB) is a scale-out Database Management System (DBMS) that is designed to scale linearly to Petabytes of data. Like all DBMSs, scale-out is implemented through sharding. Two different sharding policies are supported:

Auto-sharding
Pre-defined sharding

Regardless of approach, there are APIs to enable sharding based on hash, range of values, and the combination of both.

Auto-sharding

When auto-sharding is enabled the tables are dynamically distributed across the cluster and when a shard size exceeds the configurable limit, it is automatically split and moved between servers in a cluster.

A table segment is split into two at the middle key, creating two roughly equal halves and those two halves can be served by different servers.

Automated sharding is applied regardless of the network that is used with the OpDB (WAN or local). Clusters can be set up to span a WAN in which case sharding and data movement would occur across the WAN with zero data loss.

The system can be configured to be aware of which nodes are in which data centers, which provides additional resilience for shards as copies of the shards can be distributed across multiple data centers.

Pre-defined sharding

Shards can be limited to specific subsets of nodes in a cluster based on policy, typically in a tenant-specific manner. That enables the implementation of geographic-based policies. Then tables can be replicated between clusters and set by policies to ensure that replication of tables, and the associated shards, is limited to desired geographies.

Cloudera’s OpDB provides native support for data sovereignty. If a cluster spans multiple countries, region server groups can be used to anchor data in specific countries along with HDFS Rack isolation configuration.

Queries

Cloudera provides three query engines optimized for different types of use cases, both operational and analytical, and NoSQL interfaces to enable optimized performance ranging across a broad range of both operational and data warehouse workload. This enables the execution of queries and joins of data across multiple shards.

Cloudera’s OpDB provides a native OLTP SQL engine that supports querying multiple data and object models including querying and joining across them. Two of our OLAP query engines can be used to map external tables that reside within our OpDB (or in other locations) and can query or join across them for more complex analytical queries typical of data warehousing

Data integration tools

Cloudera provides multiple tools to enable integration with data warehousing and federated query processing.

For example:

Bulk export to a data warehouse is provided by Flink, Spark, Hive, and MapReduce
Streaming export to a data warehouse is provided by Nifi
In-situ data query within our OpDB is provided by Phoenix, Impala, and Hive
Federated query processing across our OpDB, data warehouse solution, and third party data warehouse solutions are provided by Hive

External data support

Cloudera’s OpDB includes many Hadoop tools and integrates with most of the Hadoop ecosystem.

Our OpDB provides NoSQL and SQL interfaces. There are no restrictions on this interfacing and it is very well supported in the Hadoop community.

Mobile OpDB

MiNiFi can be used on portable devices at the edge and provide data connectivity with the OpDB.

The query editor HUE can run on a mobile or portable device.

Standard-based connectivity

Cloudera provides both JDBC & ODBC drivers provided through our SQL engines in addition to direct API access to our data-stores and tools.

Next Up

In this blog post, we looked at some of the OpDB accessibility capabilities such as data query, data integration, and connectivity. In the next article, we’ll cover how you can make use of the administration capabilities in OpDB, find it here.

For more information, please go to: Getting Started with Operational Database.

Liliana Kadar

More by this author

Gokul Kamaraj

More by this author

Krishna Maheshwari

Director of Product Management

More by this author

Editor's Choice

Business

Generative AI for the Enterprise

Technical

Building Trust in Public Sector AI Starts with Trusting Your Data

Operational Database Accessibility

Distribution and sharding

Auto-sharding

Pre-defined sharding

Queries

Data integration tools

External data support

Mobile OpDB

Standard-based connectivity

Next Up

Editor's Choice

Leave a comment Cancel reply