Cloudera Altus on Microsoft Azure

Categories: Altus Cloud

Cloudera Altus (launched in May 2017) is a platform-as-a-service (PaaS) offering that enables users to analyze and process data at scale in public cloud infrastructures. Altus was designed from the outset to support multiple clouds from the perspective of both back-end architecture and front-end workflows. With the announcement of Microsoft Azure support, Altus will be able to support data engineering workloads in Microsoft Azure, with the same Altus interfaces for API and CLI, and a streamlined, multi-cloud UI console. The Azure data engineering workflows are diagrammed below:

Azure ArchitectureIn this blog post, we introduce Cloudera Altus on Microsoft Azure and show you how to run data engineering workloads using Azure Data Lake Store (ADLS) for your data. Altus on Azure is not yet available but will be open for beta soon.

Connecting Cloudera Altus to Microsoft Azure

In Altus, we define an “Environment” as an encapsulation of cloud provider resources needed to deploy Cloudera clusters such as Virtual Networks and Network Security Groups. Typically users will have a particular set of resources provided by their IT team that they should use.

azure deploymentAltus provides a configuration wizard that allows you to easily specify these resources:

configuration wizardBecause Altus is a PaaS offering, you must first give proper authorization in order to allow Altus to provision and manage Cloudera clusters in your Azure subscription. Altus uses Azure’s multi-tenant application model to manage resources in a user’s Azure subscription. An administrator of the Azure subscription must provide consent to grant Altus access to the subscription. The process of admin consent is built into the Altus Azure Environment creation wizard. When you create the first Altus Environment for an Azure subscription, you will be redirected to the Azure login page. Here you must log in as an admin of the subscription and give consent for Altus to access this subscription.

Azure loginOnce the admin consent step is complete, you will be automatically redirected back to the Altus console to continue Environment creation. The admin consent process is only required once per Azure subscription. Finally, you should make make sure to use Azure regions with ADLS availability for your clusters.

Azure configuration wizardFor managing Altus Environments, you can navigate to the Environments page where you can find the Environments you created for different cloud providers. You can filter by cloud providers to see only Azure Environments.

Altus environmentsAccessing Data in Azure Data Lake Store (ADLS)

Cloudera Altus uses the Azure Active Directory (AAD) Managed Service Identity (MSI) feature to gain access to ADLS. The Cloudera and Microsoft engineering teams worked together to integrate MSI capability with Apache Hadoop. ADLS supports fine-grained POSIX style Access Control Lists (ACLs). By configuring ADLS ACLs, users can grant Altus access to individual ADLS accounts (root directory) or subdirectories.

ADLS ACLWith MSI, Altus does not require user secret keys (Azure AD service principal secrets) to read or write data to ADLS. In fact, no keys are stored anywhere on the Cloudera cluster.

Creating a cluster with Altus Data Engineering

Creating a data engineering cluster on Azure via Altus is straightforward. Users only need to specify a few parameters such as service type (Spark, Hive, MR2), instance type and cluster size. Altus takes care of creating and configuring the cluster behind the scenes.

Azure create cluster detailFor managing clusters, you can navigate to the Clusters page where you can find all the clusters in Altus created across different cloud providers. You can filter by cloud providers to see the status of only Azure clusters.Altus clusters

Submitting Jobs with Altus Data Engineering

The job submission experience of Altus Data Engineering is consistent across the cloud providers Altus supports. For Azure, you can specify ADLS paths (adl://) for both job artifacts (e.g. jar files, scripts, etc.) and input/output data directories.Azure spark job submit

CLI and Connecting Clusters

As mentioned in our previous blog post, Altus also provides a CLI tool that exposes  the UI functionality. In order to use the Altus CLI, make sure you first execute the consent workflow for the subscription through the Altus UI.Azure cluster details

What’s Next?

We will be opening beta for Altus for data engineering workloads on Azure soon. You can reach out to your Cloudera sales representative or visit this page to indicate interest in participating in the Altus on Azure beta.

Links

Cloudera VISION blog: Introducing Cloudera Altus on Microsoft Azure

Microsoft Azure on Cloudera Altus Press Release

Facebooktwittergoogle_pluslinkedinmailFacebooktwittergoogle_pluslinkedinmail

One response on “Cloudera Altus on Microsoft Azure

  1. amar

    Cloudera Altus is a Platform as as Service which makes it easy to process large data in cloud and it is also cost effective. Azure is a set of services for building,deploying and managing applications through global data centres network . So it makes a to process data with altus on azure.

Leave a Reply

Your email address will not be published. Required fields are marked *

Prove you're human! *