Create your Private Data Warehousing Environment Using Azure Kubernetes Service

Create your Private Data Warehousing Environment Using Azure Kubernetes Service

Cloudera secures your data by providing encryption at rest and in transit, multi-factor authentication, Single Sign On, robust authorization policies, and network security.

For Cloudera ensuring data security is critical because we have large customers in highly regulated industries like financial services and healthcare, where security is paramount. Also, for other industries like retail, telecom or public sector that deal with large amounts of customer data and operate multi-tenant environments, sometimes with end users who are outside of their company, securing all the data may be a very time intensive process. At Cloudera we want to help all customers to spend more time analyzing data than protecting data.  Cloudera secures your data by providing encryption at rest and in transit, multi-factor authentication, Single Sign On, robust authorization policies, and network security.

Cloudera Data Warehouse (CDW) is a cloud native data warehouse service that runs Cloudera’s powerful query engines on a containerized architecture to do analytics on any type of data. It is part of the Cloudera Data Platform, or CDP, which runs on Azure and AWS, as well as in the private cloud. The CDW service helps you:

  • become more agile when providing analytics capabilities to the business – via fast compute provisioning and Shared Data Experience
  • get better insights faster – via running all parts of the data lifecycle in one platform
  • ensure your SLAs are met – via compute isolation, autoscaling, and performance optimizations

This post explains how CDW helps you maximize the security of your cloud data warehousing platform when running in Azure. 

Network Security

CDW has long had many pieces of this security puzzle solved, including private load balancers, support for Private Link, and firewalls. As of a recent release it now also supports the ability to use Private Azure Kubernetes Service (AKS) clusters. Private AKS ensures private communication between the Kubernetes control plane and the Kubernetes nodes, which are run in the user’s Virtual Network (VNET). As such, it is now possible to run a private CDW environment in Azure.

For the most security-conscious customers, it is a requirement that all network access be done over private networks. This reduces the threat surface area, rendering impossible many of the most common attack vectors that rely on public access to the customer’s systems. When using AKS there are two types of network access:

  1. Communication to and from the services running on the nodes within the AKS cluster
  2. Communication between the nodes in the AKS cluster and the Kubernetes control plane API

For network access type #1, Cloudera has already released the ability to use a private load balancer. This ensures that your users who are interacting with the services running within the AKS cluster – such as HUE, or Impala and Hive via JDBC/ODBC – can only do so when using a private network. The image below shows the relevant network communication when using a private (or internal) load balancer and only private IP addresses.


For network access type #2, CDW originally only supported communication over public endpoints, which meant that your CDW environment was not completely walled off within a private network. However, now that CDW supports Private AKS, all communication with the Kubernetes control plane remains on a private network. 

We can now create a private CDW environment in Azure. So customers can run their analytics without having to worry about securing the data. The following sections provide additional details on other aspects of how this is implemented, as well as information on steps to take to set this up for yourself.

Additional Aspects of a Private CDW Environment on Azure

CDW uses various Azure services to provide the infrastructure it requires. In addition to AKS and the load balancers mentioned above, this includes VNET, Data Lake Storage, PostgreSQL Azure database, and more. We are careful to ensure that each of these are also used in a secure manner, as explained below.

Network Traffic with the CDP Control Plane

CDP provides a component called Cluster Connectivity Manager version 2 (or CCMv2) which enables the CDP Control Plane to communicate with the Kubernetes control plane and other resources in your network, such as virtual machines, using an inverting proxy solution. This ensures that all traffic goes through a secured HTTPS tunnel. In addition, you can use the Azure Private Link service to ensure that the CDP Control Plane can only be accessed through private endpoints.

Firewall Exceptions for Network Egress

For network egress coming out of the AKS cluster running in your environment, there is a transparent proxy that controls which traffic can pass. Rules are added for the required CDP control plane services, for the AKS service, and for storage account endpoints so that this outbound traffic is permitted – but no other.

Private Endpoint Access for Required Azure Services

By default Azure Data Lake Storage, PostgreSQL Database, and Virtual Machines are accessible over public endpoints. But for private CDW environments it is required to use private endpoints. If this is done then communication between these resources and with the CDW services running within the AKS cluster are done over private networks. This uses the Azure Private Link service.

Network Resolution

Custom DNS is configured on the VNET to resolve Azure Private DNS zones. To resolve private endpoint DNS records, the VNET DNS servers must be capable of resolving Azure DNS records. Additionally, user-defined routing (UDR) is configured on the VNET to forward all traffic to an egress firewall and link it to the subnet.

The image below shows a representative architecture diagram for how a private CDW environment on Azure looks.

Setup

CDW support for Private AKS and the other aspects required for a private CDW environment is currently offered as a Technical Preview, and is under entitlement. In order to try this out, please contact your Cloudera representative.

In the meantime, the setup steps are summarized below at a high level, so you can get a sense of how easy it is to get this up and running. The full steps are included in our public documentation.

Setting up the Environment

  1. Create a resource group for CDP from the Microsoft Azure portal.
  2. Create a private storage account and network access rules to block all internet traffic.
  3. Create a VNET and a subnet.
  4. Configure the CDP Control Plane Private Link service.
  5. Configure custom DNS on the VNET to resolve Azure Private DNS zones.
  6. Disable network endpoint policies for private endpoints and Azure Private Link Service.
  7. Configure firewall exceptions on the egress firewall for CDP, AKS, and storage account endpoints.
  8. Configure user-defined routing (UDR) on the VNET.
  9. Create a CDP Azure environment in the VNET that you created, choosing private environment options for the PostgreSQL database, virtual machines, and CCMv2. Do not create public IPs for the Azure VMs. Do enable the Create Private Endpoints option for the PostgreSQL Azure database.

Activating CDW with Private AKS

  1. In the CDW console, click the Activation icon for the CDP environment in which you want to activate CDW.
  2. Enter the various configs as needed for the environment. These are documented here.
  3. Make sure to choose the “Enable AKS Internal Load Balancer” and “Enable Azure Priv AKS” options. Enter “0.0.0.0/0” in the Whitelist IP CIDR(s). 
  4. Click “Activate”

Next Steps

With the support for Private AKS, as well as a host of other network security related enhancements, CDW can now run in full private mode within Azure. This helps bring the benefits of CDW to the most security conscious customers. Please try CDW out and let us know how it works for you.

Leave a comment

Your email address will not be published. Links are not permitted in comments.