Self Service is Simply Efficient - Cloudera DataFlow Designer GA announcement

by Chris Joynt

Posted in Technical | March 14, 2023 4 min read

We are thrilled to announce that the new DataFlow Designer is now generally available to all CDP Public Cloud customers. Data leaders will be able to simplify and accelerate the development and deployment of data pipelines, saving time and money by enabling true self service.

It is no secret that data leaders are under immense pressure. They are being asked to deliver not just theoretical data strategies, but to roll up their sleeves and solve for the very real problems of disparate, heterogenous, and rapidly expanding data sources that make it a challenge to meet increasing business demand for data—and do it all while managing costs and ensuring security and data governance. It’s not just the standard “do more with less”—it’s doing a lot more with less while growing complexity, which makes delivery a painful set of trade-offs.

With relentless focus on transforming business processes to be more responsive to timely, relevant data, we see that most organizations are now distributing data from more sources to more destinations than ever before. In this environment complexity can quickly get out of hand, leaving IT teams with a backlog of requests while impatient LOB users create sub-optimal workarounds and rogue pipelines that add risk. Sometimes referred to as “spaghetti pipelines” or the “Spaghetti Ball of Pain,” our customers describe scenarios where data-hungry LOBs go outside of IT and hack together their own pipelines, accessing the same source data and distributing to different places, often in different ways, paying little to no mind about enforcing data governance standards or security protocols. While the first or second non-sanctioned pipeline might seem like no big deal at first, risk compounds quickly and oftentimes isn’t truly felt until something goes wrong.

Security breach? Good luck getting visibility into the extent of your exposure where rogue pipelines abound. Data quality issue? Good luck auditing data lineage and definitions where policies were never enforced. Massive cloud consumption bill you can’t account for? Good luck controlling all the clusters deployed in haphazard ways. One customer told us bluntly, “If you think you’re not doing data ops, you’re doing data ops that you just don’t know about.”

The holy grail for data leaders is the elusive self-service paradigm, a balance between end user flexibility and centralized control. When it comes to data pipelines, self-service looks like centralized platform admins with visibility and enough control to manage performance and risk, while enabling developers to onboard new data pipelines when needed. A self-service data pipeline platform therefore needs to provide the following:

Ability to build data flows when needed without having to involve an admin team
Ability for new users to learn the tool quickly so they are productive
Ability for developers to deploy their work to production or hand it over to the operations team in a standardized way
Ability to monitor and troubleshoot production deployments

Self-service in data pipelines has the benefits of reducing costs, helping small administration teams scale to meet demand, accelerated development, and reduced incentive for costly workarounds. Business users benefit from self-service data pipelines as well—being simultaneously better able to develop their own innovative new data-driven solutions and better able to trust the data they are utilizing.

So how are data leaders to strike this balance and enable the self-service holy grail? Enter Cloudera DataFlow Designer.

Back in December we released a tech preview of Cloudera DataFlow Designer. The new DataFlow Designer is more than just a new UI—it is a paradigm shift in the process of data flow development. By bringing the capability to build new data flows, publish to a central catalog, and productionalize as either a DataFlow Deployment or a DataFlow Function, flow developers can now manage the entire life cycle of flow development without relying on platform admins.

Developers use the drag-and-drop DataFlow Designer UI to self-serve across the full life cycle, dramatically accelerating the process of onboarding new data. Resources are made maximally efficient with automated provisioning of infrastructure precisely at that specific point in the cycle and not left running continuously. Each phase is now more efficient:

Development: Users can quickly build new flows or start with ReadyFlow templates without dependency on admins.
Testing: With test sessions in a single integrated user experience users can get immediate feedback during development, reducing cycle times that can be extended frustratingly when flow definitions are not properly configured for deployment.
Publishing: Users have access to a central catalog where they can more easily manage versioning of flows.
Deployment: Users can work from deployment templates and quickly configure parameters, KPIs to monitor, etc.

Cloudera is delivering the most efficient, most trusted, and most complete set of capabilities on the planet today to capture, process, and distribute high velocity data to drive utilization across the enterprise. Business is demanding more data-driven processes. Developers are demanding more agility. The GA of DataFlow Designer helps our customers deliver on both. Furthermore, customers can realize infrastructure cost savings from a much lighter footprint across the data pipeline life cycle, while giving admin teams visibility and control. Self-service delivers the rapid development and deployment of data flows while combating the hidden costs and risks of rogue pipelines.

For more information or to see a demo, go to the DataFlow Product page.