Cloudera Altus (launched in May 2017) is a platform-as-a-service (PaaS) offering that enables users to analyze and process data at scale in public cloud infrastructures. Altus was designed from the outset to support multiple clouds from the perspective of both back-end architecture and front-end workflows. With the announcement of Microsoft Azure support, Altus will be able to support data engineering workloads in Microsoft Azure, with the same Altus interfaces for API and CLI, and a streamlined, multi-cloud UI console. The Azure data engineering workflows are diagrammed below:
In this blog post, we introduce Cloudera Altus on Microsoft Azure and show you how to run data engineering workloads using Azure Data Lake Store (ADLS) for your data. Altus on Azure is not yet available but will be open for beta soon.
Connecting Cloudera Altus to Microsoft Azure
In Altus, we define an “Environment” as an encapsulation of cloud provider resources needed to deploy Cloudera clusters such as Virtual Networks and Network Security Groups. Typically users will have a particular set of resources provided by their IT team that they should use.
Because Altus is a PaaS offering, you must first give proper authorization in order to allow Altus to provision and manage Cloudera clusters in your Azure subscription. Altus uses Azure’s multi-tenant application model to manage resources in a user’s Azure subscription. An administrator of the Azure subscription must provide consent to grant Altus access to the subscription. The process of admin consent is built into the Altus Azure Environment creation wizard. When you create the first Altus Environment for an Azure subscription, you will be redirected to the Azure login page. Here you must log in as an admin of the subscription and give consent for Altus to access this subscription.
Once the admin consent step is complete, you will be automatically redirected back to the Altus console to continue Environment creation. The admin consent process is only required once per Azure subscription. Finally, you should make make sure to use Azure regions with ADLS availability for your clusters.
For managing Altus Environments, you can navigate to the Environments page where you can find the Environments you created for different cloud providers. You can filter by cloud providers to see only Azure Environments.
Cloudera Altus uses the Azure Active Directory (AAD) Managed Service Identity (MSI) feature to gain access to ADLS. The Cloudera and Microsoft engineering teams worked together to integrate MSI capability with Apache Hadoop. ADLS supports fine-grained POSIX style Access Control Lists (ACLs). By configuring ADLS ACLs, users can grant Altus access to individual ADLS accounts (root directory) or subdirectories.
Creating a cluster with Altus Data Engineering
Creating a data engineering cluster on Azure via Altus is straightforward. Users only need to specify a few parameters such as service type (Spark, Hive, MR2), instance type and cluster size. Altus takes care of creating and configuring the cluster behind the scenes.
For managing clusters, you can navigate to the Clusters page where you can find all the clusters in Altus created across different cloud providers. You can filter by cloud providers to see the status of only Azure clusters.
Submitting Jobs with Altus Data Engineering
The job submission experience of Altus Data Engineering is consistent across the cloud providers Altus supports. For Azure, you can specify ADLS paths (adl://) for both job artifacts (e.g. jar files, scripts, etc.) and input/output data directories.
CLI and Connecting Clusters
As mentioned in our previous blog post, Altus also provides a CLI tool that exposes the UI functionality. In order to use the Altus CLI, make sure you first execute the consent workflow for the subscription through the Altus UI.
We will be opening beta for Altus for data engineering workloads on Azure soon. You can reach out to your Cloudera sales representative or visit this page to indicate interest in participating in the Altus on Azure beta.