Accelerating Deployments of Streaming Pipelines – Announcing Data in Motion on Kubernetes

Accelerating Deployments of Streaming Pipelines – Announcing Data in Motion on Kubernetes

Organizations are challenged today to become both more data driven and more nimble to adapt quickly to changing conditions. These challenges are the driving forces behind much of their digital transformation or “modernization” efforts.  Digital Transformation is defined as the process of integrating digital technology into all areas of a business to create and capture value in new ways, effectively “datifying” all processes while remaining agile enough to make continuous incremental improvements.   

To support these transformation efforts, Cloudera is thrilled to announce that all Data-in-Motion capabilities will be available 2H 2024 as independently deployable Kubernetes operators. Now, organizations that strive to be more data-driven can take advantage of modern containerized services to accelerate innovation and scale efficiently. 

Regardless of industry or use case, there are two key themes that always arise when executing on digital transformation strategies.  

  1. Data needs to be shared in real time so it can be embedded deeper into everyday operational processes across the organization that are working from the same ground-truth.  
  2. Applications, data, and supporting infrastructure must be broken down from the monolithic, tightly coupled architectures of the past into smaller building blocks that are independently modifiable, reconfigurable, and scalable.  

So, the strategic capabilities that are needed to execute on innovation projects in support of digital transformation goals or scale them to the enterprise as they evolve are: 

  • The ability to capture, process, and distribute any and all data in real time to any user, application or system
  • The ability to rapidly deploy, provision, manage, and scale efficiently across hybrid and multi-cloud environments.  

This new release will deliver both of those strategic capabilities. 

These offerings are also significant industry “firsts,” helping bring the power of open-source technology to the enterprise.   

Cloudera Flow Management – Kubernetes Operator is the only commercially supported Kubernetes deployment for Apache NiFi while Cloudera Streaming – Kubernetes Operator is the only offering for both Apache Kafka and Apache Flink in Kubernetes form factor that can be run anywhere. These services provide a unique and powerful combination of streaming data movement, real-time processing, and streaming analytics with low code development experiences that make delivery of real-time pipelines efficient at massive scale. 

Staying true to our commitment to deliver complete capabilities for streaming data across hybrid environments, this release represents a significant milestone for Cloudera and our enterprise customers. Customers who wish to build streaming pipelines can quickly deploy and manage scalable clusters that support their real-time data needs in any environment in the cloud or on premise. They can do this independent of the Cloudera Data Platform instances, giving greater speed and flexibility than ever before.   

Customers choose Cloudera’s Data-in-Motion tools for numerous reasons including:

  • Our breadth of integrated capability
  • Open-source innovation with enterprise-grade delivery
  • Low-code development experiences
  • Efficient scalability

The value of our holistic approach to Data-in-Motion is an overall simplification and efficiency of pipeline architecture able to deliver actionable data across the enterprise for all real-time use cases and evolve with changing data or business requirements.

At the same time, many organizations choose Kubernetes as an enterprise standard deployment form factor for good reason. There is tremendous value just in having a standard – it is much easier to make an administrative team efficient and to manage resources while maintaining security when there is a single enterprise standard for them to work with. As far as container orchestration tools go, the industry has settled on Kubernetes as the de facto standard, meaning the market is full of IT professionals who are already proficient with Kubernetes. Kubernetes has earned that position due to a number of factors

  • Commitment to open source
  • Automation tools
  • Scalability 
  • Cloud-agnostic flexibility.  

Organizations that orchestrate Kubernetes containers take advantage of efficiencies in deployment and management.

Together, a holistic set of streaming services containerized as Kubernetes operators delivers the key strategic capabilities all organizations need to execute on their digital transformation goals and continually evolve over the long term. Cloudera Flow Management – Kubernetes Operator, and Cloudera Streaming – Kubernetes Operator bring a new level of hybrid portability to Data-in-Motion, which is critical for any organization operating in multiple clouds and data centers and has requirements around where data must live, such as data sovereignty laws in the EU. Deploying via these new operators as independent services improves innovation speed- development clusters can be set up in just minutes by any IT admin. From there, they can use their container orchestration tooling of choice to monitor, manage, and scale deployments.

There are economic reasons to deploy on-premises too. Many organizations are facing cost and latency challenges related to streaming data in the cloud. From a cost perspective, the sheer volume of streaming data, as well as ingress and egress fees and variable compute requirements all add up to high prices for streaming workloads in particular. On the performance side, network latency is an unavoidable reality of cloud computing, and in the case of streaming workloads, it can result in missed SLAs for performance. This new offering is the only supported Kubernetes operator for both Kafka and Flink, allowing our customers to reduce costs and latency by bringing the stream processing right to the data as opposed to pushing everything to the cloud!

An example of Data-in-Motion at scale could be a Cloudera customer delivering cyber security pipelines that continuously evolve. Our customer is able to collect, process, and filter log data from hundreds of thousands of distributed devices, streaming that data for high-speed ingestion into a cyber data lake for analysis much more efficiently than a SIEM-only approach. After first delivering this data in real time to threat analysts and dramatically reducing mean time to detection, the team was easily able to reconfigure existing pipelines to deliver similar data to systems reliability teams who needed it. As their needs evolved, they added stateful real-time processing to their pipelines to monitor for patterns that represented threats that their analysts can update with simple SQL queries against streaming data.

Other use cases that require the ability to orchestrate real-time data and evolve quickly are fraud prevention, supply chain optimization, cybersecurity, personalized offers, and generative AI, to name a few.       

This is an important milestone for the open source community and an example of Cloudera’s continued commitment to delivering the industry’s most comprehensive set of streaming capabilities across hybrid environments to truly capture, process, and distribute any data anywhere.

For more information or to see a demo, we invite you to join our webinar on June 11th here

Chris Joynt
Senior Product Marketing Manager
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.