Apache NiFi – the data movement enabler in a hybrid cloud environment

Cloudera provides its customers with a set of consistent solutions running on-premises and in the cloud to ensure customers are successful in their data journey for all of their use cases, regardless of where they are deployed. Cloudera DataFlow provides Apache NiFi in both the Cloudera Data Platform Private Cloud Base (on-premises) and Public Cloud (AWS, Azure, and Google Cloud) products in this hybrid cloud strategy. This comprehensive and flexible solution gives our customers a very secure, efficient, and easy way to traverse data back and forth between the different environments they have in many other locations.

While developers can deploy many applications in the cloud, not all data workloads belong there. The cloud providers perfectly know this data challenge as they all provide more and more solutions to run workloads on-premises to work with their technologies. Some of the challenges could be less comprehensive security, unplanned price increases when data processing accelerates, or a simple inability to meet the more aggressive SLA deadlines. So, the public cloud is not always a good fit for every business needs.

The importance of a hybrid cloud strategy

As a former Google employee, I witnessed the birth of Anthos, the Google Cloud product they launched to provide a hybrid cloud strategy. The announcement was huge: being able to run Google’s technology on another cloud provider’s infrastructure as well as on-premises using Google Kubernetes Engine. I loved the idea. Being able to run consistently the same piece of code anywhere, in the cloud or in your data center, should be the expectation for everyone.

Being able to deploy workloads spanning multiple environments is what Cloudera provides with the Cloudera Data Platform. But what about the data? How can you easily move data between your environments in a consistent manner for all of your use cases? Besides, how can a business ensure no vendor lock-in when using the cloud providers’ managed services? And what about the associated complexity when you’re using two or more cloud providers?

With Cloudera Public Cloud and Cloudera Private Cloud, you get a consistent platform with a management, security, and governance layer that spans your on-premises infrastructures and any cloud provider. You can run the same workloads anywhere with no change, and you can truly benefit from the promises of the cloud with computing resource elasticity. And we learned that our customers are usually paying for two or more cloud providers. It makes sense. Customers want to get the best technology from each cloud provider and they also want the leverage when it comes to pricing negotiations. That’s why Cloudera delivers the same Public Cloud platform on any cloud provider.

Get your data where you need it, when you need it

Now we all know that having a successful business is about having the right data. For any business, for any use case, it starts with the data. Where is my data? What is my data? Why do I need it? How to get the data and how to gain insights from it? Moving data around to analyze and process it might sound like the easiest part of a use case, but it’s not! Getting the data in and getting it right is a very challenging problem. That’s why Cloudera provides Cloudera Flow Management powered by Apache NiFi. This data flow solution is the best and easiest technology to get your data wherever you need it to process it. And when you adopt a true hybrid cloud architecture, you need to easily move data back and forth between your environments, while ensuring the proper level of security, resilience, auditability, and governance.

Apache NiFi provides an easy and secure way to efficiently exchange data bi-directionally between NiFi clusters while ensuring consistent policies across your environments with Apache Ranger and providing data lineage and governance with Apache Atlas. Both softwares are part of SDX, the Cloudera Shared Data Experience, within the Cloudera Data Platform. NiFi provides high availability and proper load balancing while also enabling data compression to make sure your ingress and egress costs are under control at all times. NiFi can also be used to collect data from a very wide range of sources, especially with Cloudera Edge Management and the MiNiFi agents that you can deploy at the edge and scale. In addition, NiFi provides a wide range of processors to interact with the native managed services of the cloud providers. This way, you can use NiFi to move data around but still leverage your favorite technologies from your cloud providers when the data is made available there.

One of our automotive customers is using Apache NiFi Site to Site protocol to move data between their factories and the cloud to leverage the elasticity of Cloudera Data Warehouse and the Cloudera Machine Learning experience. It ensures that you’re using the right compute resources only when you need them and can move the results back to your on-premises deployments if you want to. This capability is a perfect example of cost optimization, where you only use what you need.

Another use case is when you have many streaming data sources and don’t want to send everything to your public cloud provider(s). Because you’re operating worldwide, you don’t want to have your sources sending data over the internet. You want a single gateway for all of your data transfers, and you want to control your ingress and egress costs. Once again, you can use Cloudera Flow Management as the gateway to move data back and forth between your environments. It provides a single reliable way of moving data around, it makes things simple, and you have consistent security, data lineage, and data governance.

Cloudera Flow Management, powered by Apache NiFi, is the best technology to address data movement challenges for batch and streaming use cases in a true hybrid cloud environment. NiFi should be your gateway to move any bit of data. Get in touch with us to get up and running in no time!

Read more: Moving data from CDP Private Cloud Base to Public Cloud with NiFi site-to-site

Pierre Villard
Director, Product Management - Data in Motion
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.