Happy Birthday, CDP Public Cloud

On September 24, 2019, Cloudera launched CDP Public Cloud (CDP-PC) as the first step in delivering the industry’s first Enterprise Data Cloud.

That Was Then

In the beginning, CDP ran only on AWS with a set of services that supported a handful of use cases and workload types:

  • CDP Data Warehouse: a kubernetes-based service that allows business analysts to deploy data warehouses with secure, self-service access to enterprise data
  • CDP Machine Learning: a kubernetes-based service that allows data scientists to deploy collaborative workspaces with secure, self-service access to enterprise data.
  • CDP Data Hub:  a VM/Instance-based service that allows IT and developers to build custom business applications for a diverse set of use cases with secure, self-service access to enterprise data. 

At the heart of CDP is SDX, a unified context layer for governance and security, that makes it easy to create a secure data lake and run workloads that address all stages of your data lifecycle (collect, enrich, report, serve and predict).

This is Now

With CDP-PC just a bit over a year old, we thought now would be a good time to reflect how far we have come since then.  Over the past year,  we’ve not only added Azure as a supported cloud platform, but we have improved the original services while growing the CDP-PC family significantly:

Improved Services

  • Data Warehouse – in addition to a number of performance optimizations, DW has added a number of new features for better scalability, monitoring and reliability to enable self-service access with security and performance 
  • Machine Learning – has grown from a collaborative workbench to an end-to-end Production ML platform that enables data scientists to deploy a model or an application to production in minutes with production-level monitoring, governance and performance tracking.
  • Data Hub – has expanded to support all stages of the data lifecycle:
    • Collect – Flow Management (Apache NiFi), Streams Management (Apache Kafka) and Streaming Analytics (Apache Flink)
    • Enrich – Data Engineering (Apache Spark and Apache Hive)
    • Report – Data Engineering (Hive3), Data Mart (Apache Impala) and Real-Time Data Mart  (Apache Impala with Apache Kudu) 
    • Serve – Operational Database (Apache HBASE), Data Exploration (Apache Solr) 
    • Predict – Data Engineering (Apache Spark)

New Services

  • CDP Data Engineering (1) – a service purpose-built for data engineers focused on deploying and orchestrating data transformation using Spark at scale.  Behind the scenes, CDE leverages kubernetes to provide isolation and autoscaling as well as providing a comprehensive toolset to streamline ETL processes – including orchestration automation, pipeline monitoring and visual troubleshooting
  • CDP Operational Database (2) – an autonomous, multimodal, autoscaling database environment supporting both NoSQL and SQL.  Under the covers, Operational Database leverages HBASE and allows end users to create databases without having to worry about infrastructure requirements 
  • Data Visualization (3) – an insight and visualization tool, pre-integrated with Data Warehouse and Machine Learning, that simplifies sharing analytics and information among data teams   
  • Replication Manager – makes it easy to copy or migrate unstructured (HDFS) or structured (Hive) data from on-premise clusters to CDP environments running in the Public Cloud 
  • Workload Manager –  provides in-depth insights into workloads that can be used for troubleshooting failed jobs and optimizing slow workloads 
  • Data Catalog – enables data stewards to organize and curate data assets globally, understand where relevant data is located, and audit how it is created, modified, secured and protected 

Each of the above is integrated with SDX, ensuring a consistent mechanism for authentication, authorization, governance and management of data, regardless of where you access your data from and how you consume it. 

Behind these new features is a support cast of many issues resolved, tweaks made and improvements added by a cast of hundreds of people to improve performance, scalability, reliability, usability and security of CDP Public Cloud.

And We Are Not Done

And that was just the first 12 months. Our roadmap includes a number of exciting new features and enhancements to build on our vision of helping you:

  • Do Cloud Better: Deliver cloud-native analytics to the business in a secure, cost-efficient, and scalable manner.
  • Enable Cloud Everywhere: Accelerate adoption of cloud-native data services for public clouds 
  • Optimize the Data Lifecycle: Collect, enrich, report, serve, and model enterprise data for any business use case in any cloud.

Learn More, Keep in Touch

We invite you to learn more about CDP Public Cloud for yourself by watching a product demo  or by taking the platform for a test drive (it’s free to get started). 

Keep up with what’s new in CDP-PC by following our monthly release summaries

(1) Currently available on AWS only

(2) Technical Preview on AWS and Azure

(3)  Data Visualization is in Tech Preview on AWS and Azure

Deepak Narain
Director, Product Management
More by this author

1 Comments

by Vidyasagar Patro on

Very informative Deepak

Leave a comment

Your email address will not be published. Links are not permitted in comments.