After the launch of Cloudera DataFlow for the Public Cloud (CDF-PC) on AWS a few months ago, we are thrilled to announce that CDF-PC is now generally available on Microsoft Azure, allowing NiFi users on Azure to run their data flows in a cloud-native runtime.
With CDF-PC, NiFi users can import their existing data flows into a central catalog from where they can be deployed to a Kubernetes based runtime through a simple flow deployment wizard or with a single CLI command. CDF-PC provides a central monitoring dashboard for flow deployments and offers custom KPI tracking and alerting allowing customers to stay on top of what matters to them.
The need for a cloud-native Apache NiFi service on Microsoft Azure
Without a cloud-native service to run NiFi flows on Microsoft Azure, organizations resorted to building and operating NiFi clusters on either virtual machines or their own container based infrastructure. While Azure services like Virtual Machines, Managed Disks, Virtual Networks and Azure Kubernetes Services (AKS) make infrastructure provisioning and management easier, organizations were still responsible for configuring, securing and operating NiFi. This ultimately forced NiFi teams to spend a lot of time on managing the cluster infrastructure, preventing them from building new data flows and onboarding new use cases.
As we saw a growing number of organizations wanting to run NiFi data flows on Azure but struggling with the operational challenges, it became clear that there was a need for a cloud service that takes care of infrastructure management and NiFi configuration to allow NiFi users to focus on what matters most to them: Building new data flows and ensuring that these data flows meet the business SLAs.
Solving Common Data Integration Use Cases with CDF-PC on Azure
CDF-PC helps Azure customers implement key data integration use cases that require data movement, filtering and transformation at scale. Apache NiFi’s rich processor library provides Azure focused processors like ADLS Gen2, Event Hub, Blob Storage or Cosmos DB out of the box. Additional Azure services can be easily integrated through their APIs using customizable NiFi processors like InvokeHTTP.
A common use case on Azure is SIEM Optimization (SIEM=security information and event management) for analyzing application log data. Cloud applications can be configured to send their logs to a central Azure Event Hub from where CDF-PC flow deployments pick up the log files to curate the events for the SIEM system. At the same time the events can be stored in ADLS Gen2 storage for custom analysis outside of the SIEM application. Using NiFi for this use case helps reduce the costs of the SIEM system and establishes a standard tool which can take any application log files and prepare them for the SIEM system.
Processing Streaming Data
Modern applications often provide streaming interfaces to send transaction data in real-time to external systems for analysis. Apache Kafka deployments are commonly used to buffer these messages for downstream consumption. Customers can use Streams Messaging clusters in CDP Public Cloud to create enterprise grade Kafka deployments on Microsoft Azure. Since not every downstream application is able to directly read from Kafka topics, CDF-PC flow deployments are often used to read and curate the events for analysis by downstream systems. A common integration point for Azure services is ADLS Gen2 for which NiFi provides out of the box connectivity options. In this use case NiFi deployments on CDF-PC are the bridge between streaming data and services relying on data being available in ADLS Gen2.
Data Ingest for Microsoft Sentinel
Microsoft Sentinel is an Azure native SIEM solution that organizations use for attack detection, threat visibility, proactive hunting, and threat response. While Microsoft Sentinel provides point integration for many source systems, not every vendor or product is supported and can be directly connected. CDF-PC flow deployments can help bridge the gap between unsupported devices and applications by turning the raw device log files into a format that Microsoft Sentinel understands and ingesting it through its HTTP API.
Getting a head start with ReadyFlows
To help organizations who are not as experienced with NiFi, CDF-PC comes with an integrated ReadyFlow Gallery which makes flow deployments for popular use cases easy. Once they have identified their ReadyFlow of choice, all they need to do is start the Deployment Wizard to provide connection parameters for source and destination systems and the first flow deployment will be up and running within minutes. Today, CDF-PC supports Azure optimized ReadyFlows to move data from Kafka to ADLS and between two different ADLS locations. In the future we will provide more Azure optimized ReadyFlows to cover the use cases mentioned above.
Leveraging key Microsoft Azure technologies to provide elastic, auto-scaling data flows
CDF-PC is powered by Microsoft Azure services to provide a scalable infrastructure for NiFi data flows. CDF-PC manages the lifecycle of these infrastructure services, freeing up NiFi administrators from infrastructure maintenance tasks such as performing upgrades or applying hotfixes for security issues.
As Figure 6 shows, CDF-PC creates and manages an AKS cluster in a virtual network that consists of two node pools – one for running Cloudera infrastructure services and one for running CDF-PC and the actual NiFi flow deployments. Each NiFi flow deployment is created in its own Kubernetes namespace for resource isolation purposes. The NiFi flow deployments can scale up and down based on CPU utilization while AKS auto-scales the node pools based on resource utilization of scheduled pods across the cluster. CDF-PC also relies on ADLS Gen2 for storing application and flow deployment log files and an Azure Postgres database to store application data.
When CDF-PC is first enabled, users can configure the minimum and maximum numbers of Nodes in the CDF-PC Node Pool which will scale up and down within the boundaries as required.
CDF-PC supports different networking setups and allows users to configure which of the available subnets in a virtual network should be used for the AKS cluster, whether users should be able to access CDF-PC through a public endpoint, as well as restricting access to CDF-PC to a list of CIDR ranges.
CDF-PC’s architecture and configurable options during service enablement make it flexible to work in any Azure setup while abstracting the complexity of the underlying infrastructure through simple wizards.
Summary & Getting Started
With the General Availability of Cloudera DataFlow for the Public Cloud on Azure, we’re entering a new era of running Apache NiFi data flows in multi-cloud environments. For the first time ever, Apache NiFi users can manage and monitor data flows running on Microsoft Azure or AWS from a single management console. CDF-PC takes care of infrastructure management, abstracts the differences between cloud providers and allows NiFi users to truly focus on developing and running their data flows.