The Cloudera Data Platform (CDP) represents a paradigm shift in modern data architecture by addressing all existing and future analytical needs. It builds on a foundation of technologies from CDH (Cloudera Data Hub) and HDP (Hortonworks Data Platform) technologies and delivers a holistic, integrated data platform from Edge to AI, helping clients to accelerate complex data pipelines and democratize data assets.
In this introductory article, I present an overarching framework that captures the benefits of CDP for technology and business stakeholders. I have developed this framework to help organizations establish the business case for investing in CDP and also to provide a mechanism to prioritize analytical investments based on specific business objectives (e.g., reduce technology costs, accelerate organic growth initiatives).
The valuation framework consists of four dimensions: 1) business value acceleration, 2) technology cost reduction and / or avoidance, 3) infrastructure cost optimization and 4) operational efficiency.
In the following sections I present the approach for quantifying each of these dimensions.
Business value acceleration
This category describes the differentiating capabilities of CDP to accelerate deployment of use cases (and realization of the associated business value) that:
- Provide a diverse set of analytical frameworks for different use cases across the data lifecycle (data streaming, data engineering, data warehousing, operational database and machine learning)
- Deliver a native integration mechanism between analytical frameworks via the Shared Data Experience (SDX) to streamline deployment of complex pipelines
- Enable enrichment of use cases with a variety of data formats and types (both structured and unstructured) and from multiple sources
- Deliver a robust security and governance mechanism through SDX that helps scaling the platform across a growing number of users and roles within the organization
Putting a quantifiable measure against business value acceleration always requires taking an industry and client context specific approach. For example, in the case of a major healthcare provider that is implementing CDP, I was able to demonstrate the business value that it delivers by articulating the ability to accelerate time-to-market for inorganic growth initiatives e.g.,
- For future divestitures and asset carve outs, CDP Public Cloud accelerates separation of data assets and analytical workloads in an elastic and scalable cloud environment. That benefit comes from Replication Manager, a key capability of CDP, that accelerates migration of existing, on-premises use cases to the public cloud by extending the same security and governance configurations
- For future acquisitions, CDP will function as the single landing zone for all big data workloads of the acquired entity, irrespective of the platform they initially reside (e.g., previous versions of CDH / HDP, other cloud warehouses, or legacy on-premises platforms) given the breadth of capabilities that CDP offers. That capability will help reduce technology debt and accelerate IT integration activities that represent a key factor in realizing business value from M&A strategies.
Technology cost reduction / avoidance
CDP delivers the following capabilities to help clients reduce (or entirely avoid) costs for ancillary technology tools that are used in conjunction with competing analytical solutions:
- Cloudera Control Plane replaces infrastructure monitoring tools by offering a single pane of glass to monitor clusters deployed on different on-premises and cloud environments
- Apache Ranger (part of the Shared Data Experience – SDX) replaces data security tools by delivering a fine-grained data access policy mechanism
- Cloudera Data Catalog (part of SDX) replaces data governance tools by facilitating centralized data governance (data cataloging, data searching / lineage, tracking of data issues etc.)
- Workload Manager (part of SDX) replaces big data application performance management tools by offering a native mechanism to analyze the performance and troubleshoot specific jobs or workloads (e.g., query failures, execution delays)
- SDX acts as a data abstraction layer that separates data assets and context from underlying processing frameworks and storage resources. As a result, it eliminates the need to use 3rd party data orchestration / abstraction tools that try to bring some level of semantic consistency across heterogenous data silos introduced by point solutions
Infrastructure cost optimization
Infrastructure is the highest cost in the total cost of ownership (“TCO”) equation for analytical use cases deployed either on-premises or in the public cloud. CDP helps clients to optimize their total infrastructure spend by providing optionality in terms of both hosting type (public cloud, on-premises or hybrid cloud) and hosting vendor (e.g., AWS, Google or Azure). That optionality is provided by the Shared Data Experience (SDX) that enables seamless transition between hosting types or cloud vendors with minimal migration effort. As a result, CDP helps clients to:
- Optimize on-premises costs by enabling burst-to-cloud for on-premises workloads based on consumption patterns and infrastructure economics. In such a way, clients can reduce or even avoid data center capacity expansion by leveraging the elasticity of the public cloud to meet peak capacity needs or free up on-premises capacity
- Optimize cloud spend for compute and storage resources by enabling a multi-cloud deployment model that helps minimize cloud spend based on relative unit cost economics among cloud providers
In addition to minimizing infrastructure costs, CDP enables organizations to avoid vendor lock-ins with cloud providers. That benefit establishes the value proposition of the Cloudera Data Platform not only towards short-term cost reduction goals but also strategic vendor diversification objectives.
This category captures the utility that the Cloudera Data Platform delivers to technology and business stakeholders in terms of operational efficiencies for activities across all stages in the data lifecycle. Those activities can be organized into the following categories:
- End-user operations: CDP accelerates Data Operations (“DataOps”) and Machine Learning Operations (“MLOps”) by providing an integrated technology platform that allows data scientists, data engineers and BI analysts to collaboratively analyze and interact with data, implement end-to-end data pipelines, etc. without integration delays or having to deal with fragmented data silos.
- Security and data governance operations: CDP delivers sophisticated security and governance capabilities to information security and data governance teams. Those capabilities streamline Security Operations (“SecOps”) such as managing user authentication and authorization. In addition, it provides a robust data management capability through the Shared Data Experience (SDX) that allows for centralized governance of data assets
- Platform management: Platform administration teams benefit from the native integration among all analytical frameworks and security / governance capabilities by not having to deal with disparate technologies in terms of integration effort (e.g., setting up proprietary integration mechanisms), dependency management, configuration overheads etc.
In summary, the Cloudera Data Platform enables all direct and indirect users of the analytical environment to minimize effort spent on non-value add tasks and focus on what matters the most: extracting value from data.
Each of the four criteria that I presented has different significance (or “weight”) based on industry and client-specific context. For example, a technology organization that is rapidly evolving its data offerings and / or expanding into new markets should assign higher importance to business value acceleration, whereas an organization that has a cost rationalization objective should focus on cost reduction or avoidance. When developing the Cloud Data Strategy for our clients, I try to formulate a detailed understanding of their business priorities and objectives and tailor this model accordingly by quantifying the right value dimensions and assigning the appropriate weight to each of them based on relative importance.
More information about Cloudera Data Platform can be found at https://www.cloudera.com/products/cloudera-data-platform.html