The Cloudera Data Platform (CDP) represents a paradigm shift in modern data architecture by addressing all existing and future analytical needs. It builds on a foundation of technologies from CDH (Cloudera Data Hub) and HDP (Hortonworks Data Platform) technologies and delivers a holistic, integrated data platform from Edge to AI helping clients to accelerate complex data pipelines and democratize data assets.
In this introductory article, I present an overarching framework that captures the benefits of CDP for technology and business stakeholders. I have developed this framework to help organizations not only establish the business case for investing in CDP, but also provide a mechanism to prioritize analytical investments based on specific business objectives (e.g., reduce technology costs, accelerate organic growth initiatives).
The valuation framework consists of four dimensions: 1) business value acceleration, 2) technology cost reduction and / or avoidance, 3) infrastructure cost optimization and 4) operational efficiency. In the following sections I present the approach for quantifying each of these dimensions.
Business value acceleration
This category describes the unique ability of CDP to accelerate deployment of use cases (and, as a result, the associated business value) by:
- Providing a comprehensive set of diverse analytical frameworks for different use cases across the data lifecycle (data streaming, data engineering, data warehousing, operational database and machine learning) while at the same time seamlessly integrating data content via the Shared Data Experience (SDX), a layer that separates compute and storage
- Supporting multiple data formats and types to enable enrichment of data assets for different use cases and finally
- Delivering a robust security and governance framework through SDX to support a growing number of users leveraging the data platform
Accelerating business value is always specific to the industry and client context. In the case of a major healthcare provider that is implementing CDP, I was able to demonstrate the value that it delivers by articulating the ability to accelerate time-to-market for inorganic growth initiatives e.g.,
- For future divestitures and asset carve outs, CDP Public Cloud accelerates separation of data assets and analytical workloads in an elastic and scalable cloud environment. That benefit comes from Replication Manager, a key capability of CDP, that enables users to migrate existing, on-premises use cases to the public cloud with the same security and governance configurations
- For future acquisitions, the two different CDP form factors (CDP Private Cloud and CDP Public Cloud) will serve as the single landing zone for all big data workloads of the acquired entity, accelerating IT integration activities and ensuring technology standardization and rationalization between our client and the acquired entity. That benefit comes from the breadth of CDP’s analytical capabilities that translates into a unique ability to migrate different big data workloads, either from previous versions of CDH / HDP or from other cloud data warehouses and legacy on-premises data warehouses that the acquired entity might be using.
Technology cost reduction / avoidance
CDP helps clients reduce (or avoid entirely) costs for ancillary technology tools that are used in conjunction with competing analytical solutions. Those ancillary tools provide “must-have” capabilities for enterprise-grade deployments, such as fine-grained access control, workload observability, data abstraction and data discovery. CDP Public Cloud eliminates the need for using those tools by providing the following capabilities:
- Cloudera Control Plane replaces infrastructure monitoring tools used to monitor clusters deployed on-premises and on different clouds from a single pane of glass
- Apache Ranger (part of the Shared Data Experience – SDX) replaces data security tools to deploy a fine-grained data access policy mechanism by natively enabling column and row-level filtering alongside with data masking
- Cloudera Data Catalog (part of SDX) replaces data governance tools to facilitate centralized data governance (data cataloging, data searching / lineage, tracking of data issues etc. )
- Workload Manager (part of SDX) replaces big data application performance monitoring tools used to analyze the performance and troubleshoot specific jobs or workloads (e.g., query failures, cost overruns)
- Finally, SDX separates data context from compute / storage and abstracts data assets from specific analytical frameworks. As result, it acts as a replacement for data orchestration / abstraction tools that try to bring some level of semantic consistency across heterogenous data silos introduced by point solutions that come with their own proprietary formats and architectural idiosyncrasies
In summary, CDP reduces the need for 3rd party tools that introduce substantial costs and result in a complicated technology stack with many dependencies.
Infrastructure cost optimization
Infrastructure costs are the largest cost component in the total cost of ownership (“TCO”) equation for analytical use cases deployed either on-premises or in the public cloud due to their computational complexity. As a result, it is important for an enterprise data platform to enable users to minimize infrastructure costs by providing optionality in terms of both hosting type (public cloud, on-prem or hybrid) and hosting vendor (e.g., AWS, Google or Azure) and thus allow for execution of a use case wherever it is most costs effective to do so. CDP delivers that capability by supporting different form factors (private, public and hybrid cloud) and all major public cloud providers. In addition, it provides hosting optionality in a dynamic fashion i.e., it enables seamless transition between form factors or cloud vendors with minimal effort via the Shared Data Experience (SDX) that acts as state and hosting abstraction layer. In particular, SDX enables clients to:
- Optimize on-premises cost using Replication Manager by allowing bursting on-premises workloads to the public cloud based on usage patterns and infrastructure economics. In such a way, clients can avoid on-premises capacity expansion by leveraging the elasticity of the public cloud to meet peak capacity needs
- Optimize compute and storage cloud spend by enabling a multi-cloud deployment model that helps clients to minimize TCO by selecting the cloud vendor with the lowest compute and storage costs for either a specific use case, environment or region
In addition to minimizing infrastructure costs, CDP enables organizations to avoid vendor lock-ins. That benefit establishes the value proposition of the Cloudera Data Platform not only towards short-term cost reduction goals but also strategic vendor diversification objectives.
The last value category captures the utility that the Cloudera Data Platform delivers to technology and business stakeholders in terms of the operational efficiencies for activities across all stages in the “data lifecycle”. Those activities can be organized into the following categories:
- End-user operations: CDP accelerates Data Operations (“DataOps”) and Machine Learning Operations (“MLOps”) by providing an integrated technology platform that allows data scientists, data engineers and BI analysts to quickly synthesize and interact with data, implement end-to-end data pipelines, etc. without integration delays or having to deal with fragmented data silos that result in operational inefficiencies.
- Security and data governance operations: CDP delivers sophisticated security and governance capabilities to information security and data governance teams. Those capabilities simplify Security Operations (“SecOps”) such as managing user authentication and authorization. In addition, it provides a robust data management capability through the Shared Data Experience (SDX) that allows for centralized management and observability for data assets (e.g., data lineage and discovery).
- Platform management: Platform administration teams benefit from the native integration among all analytical frameworks and security / governance capabilities by not having to deal with disparate technologies in terms of integration effort (e.g., setting up proprietary integration mechanisms such as APIs), dependency management, configuration overheads etc.
In summary, the Cloudera Data Platform enables all direct and indirect users of the analytical environment to minimize effort spent on non-value add tasks and focus on what matters the most: extracting value from data.
Each of the four criteria that I presented has different significance (or “weight”) based on industry and client-specific context. For example, a technology organization that is rapidly evolving its data offerings and / or expanding into new markets should assign higher importance to business value acceleration, whereas an organization that has a cost rationalization objective should focus on cost reduction or avoidance. When developing the Cloud Data Strategy for our clients, I try to formulate a detailed understanding of their business priorities and objectives and tailor this model accordingly by quantifying the right value dimensions and assigning the appropriate weight to each of them based on relative importance.
More information about Cloudera Data Platform can be found at https://www.cloudera.com/products/cloudera-data-platform.html