NiFi as a Function in DataFlow Service

NiFi as a Function in DataFlow Service

Fully Serverless Solution for Running NiFi Flows

Introduction

With the general availability of Cloudera DataFlow for the Public Cloud (CDF-PC), our customers can now self-serve deployments of Apache NiFi data flows on Kubernetes clusters in a cost effective way providing auto scaling, resource isolation and monitoring with KPI-based alerting.

You can find more information in this release announcement blog post and in this technical deep dive blog post. Any customer willing to run NiFi flows efficiently at scale should now consider adopting CDF-PC.

However, for certain use cases, we want to go one step further. Today, when customers want to process files as they land into a bucket, or expose microservices that will be called intermittently, they need to run NiFi flows as a long running application which is not cost effective.This is the reason we want to provide our customers with a completely serverless option for running NiFi flows and to introduce NiFi as a Function in DataFlow Service, which is available as a private Tech Preview as of today. NiFi as a Function in DataFlow Service provides an efficient, cost optimized, scalable way to run NiFi flows in a completely serverless fashion. This is particularly powerful whenever the use case is event driven and there is no need for NiFi instances to always be up and running.

For people not familiar with NiFi, NiFi as a Function in DataFlow Service enables the first no-code UI allowing developers to take control of the full lifecycle of functions. In a matter of minutes, you can develop and deploy functions for all cloud providers.

Functions as a Service

Functions as a Service (FaaS) is a category of cloud computing services that all main cloud providers are offering (AWS Lambda, Azure Functions, Google Cloud Functions, etc). It allows customers to run micro applications that are triggered on specific events without the complexity of building and maintaining the architecture associated with the operation and launching of the applications.

It also effectively provides a serverless architecture and is very widely used when building microservices applications. By serverless we mean that resources are provisioned only when and while data is being processed by the application. This way you don’t need always up and running resources to serve your application. This is the most cost effective way of running applications that only need to process data following specific events.

With NiFi as a Function, DataFlow Service will enable developers to perform function lifecycle management using the NiFi no-code designer and the DF Service Catalog and then run that flow backed by cloud providers’ managed FaaS.

Event driven use cases

When configuring a function in a FaaS solution, a trigger must be specified. This part is managed by the cloud provider to look for specific events to happen and fire the configured function when such events occur. There are many triggers provided by the cloud providers for every FaaS solution out there. Some very common triggers are the ones firing the function whenever a file lands in a bucket, whenever a message is received in a topic of a message queue solution, whenever an HTTP request is made to a specific endpoint, and many others.

How does this translate in NiFi? For use cases where NiFi needs to process data following a specific event, NaaF provides an efficient, cost optimized, and scalable way to run NiFi flows to process the data without the need to have a long running flow.

Good examples of real life use cases for NiFi as a Function include  processing files as soon as they’re received in a bucket, real time ingestion of logs received from a very large number of sources, exposing HTTP endpoints to offer microservices and  processing data received from sensors in the IoT space. NaaF running in the Cloud providers’ FaaS makes it easy to chain together multiple functions and offer virtually unlimited scaling with no ops.

With NiFi as a Function, the DataFlow service will accelerate the development of your function using the NiFi  no-code UI, offer an ever growing set of processors and integrations to process your data, and enable a  robust SDLC solution around it. By using NiFi as a Function, you don’t need to code your functions yourself anymore. Just design your flow in NiFi and you will be up and running in a few minutes while leveraging the 400+ processors already available!

You want to compress some data as soon as it lands into S3? Create your dataflow and turn it into an AWS Lambda function with an S3 trigger in a few minutes:

You want to create a virtually unlimited scalable HTTPS gateway for pushing data into Kafka? That’s easy, just develop your flow and use the API Gateway Trigger:

Conclusion

With the addition of NiFi as a Function in the DataFlow Service, Cloudera enables Apache NiFi as the first no-code UI for building and running functions very efficiently  for a very wide range of use cases. Watch our Live Demo Jam: when and how to use NiFi Stateless to run data flow.

Pierre Villard
Director, Product Management - Data in Motion
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.