Cloudera Operational Database Infrastructure Planning Considerations

Cloudera Operational Database Infrastructure Planning Considerations

In this blog post, let us take a look at how you can plan your infrastructure planning that you may have to do when deploying an operational database cluster on a CDP Private Cloud Base deployment. Note that you may have to do some planning assumptions when designing your initial infrastructure, and it must be flexible enough to scale up or down based on your future needs. 

You can learn more about Cloudera Operational Database in CDP here: Operational Database in CDP

Before we go into the details of how you can plan your infrastructure, you may want to take a look at the minimum hardware requirements necessary to deploy an operational database (Apache HBase from CDP Runtime) in CDP Private Cloud Base here: CDP Private Cloud HBase Hardware Requirements.

In the sections below, we take a look at multiple scenarios that you may encounter when planning infrastructure. The operational database is part of the Cloudera Data Platform (CDP), and you may be using this alongside other CDP services. 

This is an overview of how you can plan for an operational database use case in CDP Private Cloud Base. For a more detailed technical material relevant for setting up of a CDP Private Cloud environment and the requirements in terms of appropriate hardware for a CDP Private Cloud Base, see the reference architecture here: https://docs.cloudera.com/documentation/other/reference-architecture.html 

Cloudera Operational Database is the primary use case 

In this scenario, the operational database is used for a dedicated primary use case in your technology stack. For example, a utility company using the operational database for OLTP use cases can use Cloudera’s operational database to store smart meter data and later use the data for OLAP use cases. The operational database is great to store large data sets, even billions of rows, and lets you analyze data in a short period. When operational database is the primary use case in your stack of services, you will need the following:

  • Dedicated storage: Use hard disks that are dedicated to the operational database. Both the Hbase Master and worker nodes must have dedicated storage capacity. 
  • Dedicated servers:  Use server hardware that provides good parallelization capabilities, and those that fit into low rack units (1U, 2U) are recommended. 
    • RAM: You can use around 16-24 GB of RAM for optimal performance. In an I/O heavy workload environment like the one that we discussed later, you can use upto 256 GB of RAM.
    • CPU: You don’t have to use a high-core count, and a standard dodeca-core processor with clock speeds of 2.5 GHz should be sufficient. 
    • Virtualization:  
  • Gigabit network bandwidth and a dedicated switch: Ensure that you use a dedicated switch to separate your operational database cluster from other applications using your network. 

Cloudera Operational Database is part of a multiple-use case scenario

In this scenario, the operational database is used in your technology stack but does not serve the primary use case. The operational database plays an important part in the use case but has to share available resources with other services. For example, a finance organization that is using an operational database and OLTP for predicting credit-worthiness and lifetime customer value. The organization may also have components for doing OLAP. When operational database is used along with other services, you will need the following:

  • Shared storage: Use disks that are dedicated to the operational database. Both the Hbase Master and worker nodes must have dedicated storage capacity. 
  • Co-located servers:  Use server hardware that provides good parallelization capabilities. But, the operational database is co-located with other services that share the server. 
  • Gigabit network bandwidth: Use a Gigabit network bandwidth but network resources are shared between different services.

Cloudera Operational Database plays a supporting role

In this scenario, the operational database plays a supporting role in your technology stack. For example, your use case involves using Atlas for data lineage auditing and linking business taxonomies to metadata. Atlas uses an operational database where HBase plays a supporting role. When operational database only plays a supporting role, you will need the following:

  • Shared storage: Disks are shared with other services. 
  • Co-located servers: Operational Database is co-located with other services that share the server. Use the minimum resources as provided in HBase Requirements for CDP Private Cloud Base.
  • Gigabit network bandwidth: Use a Gigabit network bandwidth but network resources are shared between different services.

Apart from picking one of the above scenarios, you must remember some of the following important points:

  • Separate the physical hardware used for the Master and worker nodes. You may be decommissioning the worker nodes more frequently than the Master. 
  • Co-locate the HDFS DataNode with the HBase RegionServer. You get better performance because of data locality. 
  • Use 1 Gigabit Ethernet switch with each server rack switch connected to an HBase cluster switch. 

When it comes to workloads that you plan to run on your deployment, you can categorize them based on the kind of workload and their specific requirements. Note that you can have any one of the scenarios that we discussed previously, and combine that information with the kind of workload type to arrive at a size of your physical infrastructure. 

  • A balanced workload: The resources are evenly distributed between the CPU, disk, and I/O intensive jobs. You can go for an infrastructure plan based on any of the above use case scenarios. 
  • An I/O intensive workload: The bias is towards serving jobs that are more I/O bound. For example, jobs that do sorting over a range. For this workload, it will help if you have more disks per cluster and also focus on data locality. 
  • A compute-intensive workload: The I/O bound jobs are also most of the time compute-intensive. You will need more RAM and CPU based on the size of your workload. By more, we mean you would need resources that are much more than the minimum requirements listed in the HBase Requirements for CDP Private Cloud Base.

To know more about the different use cases in which you can use Cloudera’s Operational Database, and about using CDP private cloud base, see Use cases for HBase and CDP Private Cloud Base Release Guide

Liliana Kadar
More by this author
Gokul Kamaraj
More by this author
Krishna Maheshwari
Director of Product Management
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.