With this first article of the two-part series on data product strategies, I am presenting some of the emerging themes in data product development and how they inform the prerequisites and foundational capabilities of an Enterprise data platform that would serve as the backbone for developing successful data product strategies. Once we have identified those capabilities, the second article explores how the Cloudera Data Platform delivers those prerequisite capabilities and has enabled organizations such as IQVIA to innovate in Healthcare with the Human Data Science Cloud.
Business and Technology Forces Shaping Data Product Development
From my discussions with Cloudera clients, data product development has been on top of the growth agenda in many industries such as Financial Services, Healthcare and Telecommunications. Among the plethora of industry-specific and technology themes contributing towards that growth agenda, there are some common business and technology forces influencing data product development:
- An increasing focus on data collaboration partnerships between enterprises to enable data sharing and value exchange across an industry value chain. A typical example is how large Retailers enable CPG companies to gain real time visibility into consumer buying behaviour (e.g., PoS and transaction data to optimize supply chain operations).
- A growing demand for self-service analytics from internal data consumers and knowledge workers, external partners and clients. As digital transformation and modernization initiatives increase the availability and quality of data, organizations expose data assets with self-service capabilities to improve productivity and accelerate decision making.
- The irreversible shift towards digital-native / digital-first consumption, working and learning paradigms has introduced requirements for new digital product capabilities such as AI-enabled interactions and collaborative analytics and has also accelerated adoption of data-intensive products in adjacent segments such as cybersecurity and network analytics.
- The alleviation of infrastructure and computational constraints associated with rigid on-premises data platforms; Data Products can now use different deployment models (e.g., hybrid or public, multi-cloud) and advanced analytical frameworks (e.g., Deep Java Learning, Apache Spark 3.x, and NVIDIA GPU computing), thus removing some of the scalability challenges of legacy platforms, also preventing large capital outlays for data center infrastructure.
- The proliferation of real-time processing by deploying event-driven architectures (e.g., Lambda or Kappa architectures) and implementing reliable streaming capabilities at scale by leveraging technologies such as Apache NiFi and Apache Kafka, has made possible the ability to harness and commercialize an ever-increasing volume of real-time data such as time-series or clickstream data.
- The accelerated adoption and evolution of infrastructure abstraction paradigms both in the private / public cloud domain with technologies such as Kubernetes and at the edge with open standards like WebAssembly.
The confluence of all the above business and technology factors has placed special emphasis on the organization’s data landscape and how that fits within the context of robust data product platform strategy that, based on Amrit Tiwana’s work on Platform Ecosystems meets four key success criteria Simplicity, Resiliency, Maintainability and Evolvability. These key success criteria call for a holistic rethink of the capabilities of the next-generation data platform that delivers successful data product strategies.
Five Priorities for Efficient Data Product Development
Among the key priorities of Cloudera clients that have successfully deployed and commercialized data product strategies, I have identified the following key requirements for efficient, differentiated, and scalable data platform ecosystems.
1- Apply Four Pillars of Security
Security has always been a paramount concern for data ecosystems, and will continue to play a pivotal role in successful data products. In fact, data product development introduces an additional requirement that wasn’t as relevant in the past as it is today: That of scalability in permissioning and authorization given the number and multitude of different roles of data constituents, both internal and external accessing a data product. From security capability standpoint, organizations need to comprehensively address four requirements at scale:
- Authentication: Validate and control different type of roles and user personas (internal employees, clients or partners) using a comprehensive authentication mechanism across all manifestations of a data asset.
- Authorization: Define what users of internal / external organizations can access and do with the data in a fine-grained manner that ensures compliance with e.g., data obfuscation requirements introduced by industry and country specific standards for certain types of data assets such as PII.
- Auditabily: Data security and compliance constituents need to understand how data changes, where it originates from and how data consumers interact with it. As a result, data forensics capabilities such as data lineage, ad-hoc queries and standardized reports on databases that store data changes and data schema evolution history are a key requirement of modern data platforms.
- Data Protection: Throughout the movement of data from source systems / end points to storage location and from there to downstream applications, data needs to be properly encrypted. To that end, encryption at rest and in motion across the data landscape need to be enforced in a comprehensive and systematic manner.
2- Take A Holistic Platform Approach
In their seminal work on Data Product Development, MIT academics Meyer and Zack had advocated that a well-designed and executed platform approach “enables a company to create new versions of its products rapidly and efficiently to respond to or anticipate changing market needs”. If we extend that principle to the data product domain, we will find that only an Enterprise Data Platform approach that delivers frictionless access to any type of data without introducing any data or infrastructure barriers (e.g., data silos within a data product or heterogeneous implementations of a product family across different regions), is able to truly meet the vision of a “Data Product Ecosystem” in which the Enterprise Data Platform is the technology foundation being leveraged to deliver a Consistent, Infrastructure agnostic and Flexible Platform:
- Consistency simplifies and accelerates DevOps activities related to data product development by delivering an singular development platform where different roles and development disciplines (e.g., MLOps, DataOps and Streaming DevOps) will come together to build data products and a unified control plane / observability capabilities for unified management and controls.
- Infrastructure agnosticity reduces time and cost to scale products to meet different requirements in contexts where the original infrastructure choice does not meet requirements in the new environment (e.g., a product that was originally deployed in the cloud is now being introduced in a region with different regulatory requirements that mandate use of the data center). A true data platform should enable adapting the product to new environments without material refactoring effort.
- Flexibility enables cost optimization and using the best deployment choice throughout the application lifecycle; A modern data platform should enable the organization that commercializes a product / product family to dynamically leverage (via e.g., burst-to-cloud) different deployment models and without client interruption, to optimize for cost. Flexibility would also allow to meet the requirements of a niche market segment that only a specific deployment model addresses.
3- Build Modular, Customizable Experiences to Support Product Families
Expanding on the previous point around platform architecture that empowers successful product families, organizations that have taken the “long-view” in formulating a data and analytics monetization approach have realized that building a data platform to deliver a single product and then using extraneous components for the next derivative is simply not a scalable approach. That is because of all the additional cost and complexity factors associated with data movement / orchestration and duplicative storage costs emanating from stitching together different components / analytical capabilities. That ultimately delays time to market and undermines profit margins, let alone the different observability and management tools that need to be used to complement that stack for efficient control and performance. As a result, organizations need to evaluate the long term product portfolio strategy and how the data platform needs to be defined to realize that product vision, enabling modularity and extensibility.
4- Compose Data Experiences Organized around Value Propositions, Not Intermediate Data Outputs
A common pitfall in the development of data platforms is that they are built around the boundaries of point solutions and are constrained by the technological limitations (e.g., a technology choice such as Spark Streaming is overly focused on throughput at the expense of latency) or data formats (e.g., a solution that is focused on structured data and partially addresses unstructured data). As I am working with client executives to establish the business case around different service offerings that address multivariate market needs, I’ve concluded that there is great variation in the expected service characteristics; For example, a target persona has a short-term need for real-time visibility into a particular analytical environment whereas another is looking for a persistent, dedicated data lake to store and manage data.
As a result, data platforms need to deliver multiple product attributes and features rather than focusing on a particular analytical output or intermediate analytical stage (e.g., data warehousing). Those data product attributes include both functional and non-functional characteristics that translate to targeted, derivative value propositions that meet the needs of niche market segments.
5- Empower the Next Generation of Data Consumers with Self-Serve and Data Discovery Capabilities
Organizations that have successfully implemented innovative data products which radically transform industries, have evolved the nature of the analytics professional from a generic technology / data science expert to the industry-aware data scientist. Given their domain and technical experience, that role is able to find solutions in settings where there is complexity and lack of uniformity in data and bring understanding in contexts without universally accepted terms or common data models. An example of such organizational evolution has happened at IQVIA that has built an industry-leading Human Data Science Cloud leveraging the Cloudera Data Platform (CDP). As part of that organizational transformation, the data scientist role has morphed into the human data scientist one. Unlike the generalist data scientist approach to e.g., apply a toolkit of regression analysis, p-test, or other statistical analysis for the data at hand, the human data scientist will leverage intuition and creativity, preventing them from using old tools to answer new questions.
To accomplish such transformation, organizations need to empower product development teams with the right self-serve capabilities such as Edge-2-AI data visualization and discovery capabilities for all data sources pertinent to the knowledge worker’s duties. Those capabilities will not only remove pre-existing constraints in accessing and the understanding of data, but will also broaden the “art of the possible” with regards to what the industry-aware data scientist can do with the available data, thus pushing the boundaries of data product innovation.
This part of the Building Successful Data Strategies series explored the requirements for an Enterprise Data Cloud that delivers Simple, Resilient, Maintainable and Evolvable product strategies:
In the next part of the series, we will look into the specific capabilities of the Cloudera Data Platform that has enabled successful data product strategies. I would be more than happy to engage in a discussion with organizations that are interested to learn more about emerging trends in data product development and how Cloudera helps with commercializing innovative data products.