In the age of the AI revolution, where chatbots, generative AI, and large language models (LLMs) are taking the business world by storm, enterprises are fast realizing the need for strong data control and privacy to protect their confidential and commercially sensitive data, while still providing access to this data for context-specific AI insights. Many organizations are looking to the inherent privacy that on-premises solutions provide, to leverage the power of LLMs within the walls of their own data center. When it comes to on-premises data platforms, Cloudera continues to be the vendor of choice.
Our latest release (CDP Private Cloud Base 7.1.9) is the foundation of Cloudera’s open data lakehouse platform, on premises. It delivers comprehensive analytics with powerful data management, enabling organizations to deliver trusted enterprise data at scale in order to deliver fast, actionable insights and trusted AI. Its true strength lies in managing your enterprise data and workloads with the inherent privacy and security of the protective (and sometimes completely air-gapped) walls of your own data center, as well as cost efficient operation for the selected workloads. The secret sauce of Cloudera’s open data lakehouse is the fastest growing table format, Apache Iceberg, which delivers flexibility and agility so data practitioners can use the tools or engines of their choice to deliver multifunction analytics on the same data. It also guarantees trusted, reliable data for fast decision making and trusted AI.
What’s in this release?
We’re extremely proud of the 110+ features and innovations delivered in this release, designed to revolutionize your on-prem data experience. Paul Codding, executive vice president of product management at Cloudera, summarizes the value of this release in the video above. You can learn more about the full feature list in the release summary. In this release, we deliver new features and innovation across four major categories:
- The release delivers a fully featured open data lakehouse, powered by Apache Iceberg in the private cloud. This represents the realization of our “Iceberg everywhere” vision. Now you have the flexibility to deploy your open data lakehouse wherever your data resides—be it on any public cloud, private cloud, or on-premises infrastructure, all within a true hybrid experience. This integration of Apache Iceberg brings robust data warehouse capabilities to your data lake, including support for ACID transactions—enabling concurrent data access by multiple teams, all utilizing a variety of computing options. The result? The elimination of data silos, simplified ETL pipelines, and a substantial reduction in storage costs, all thanks to a single data copy that caters to multiple use cases. Cloudera’s open data lakehouse adds an array of powerful new features, such as the ability to make schema changes on the fly, historical data management and rollbacks, and a proven track record of high-performance analytics on large-scale data. By adopting Iceberg, an engine-agnostic table format, you’ll experience a significant reduction in data management complexity and a remarkable boost to your analyst and data scientist productivity. It’s time to make your data work for you and pave the way for rapid initiation of new data science and analytics projects.
- According to IDC*, today over half of the world’s enterprise production data is on premise. This highlights that organizations still rely heavily on traditional storage methods despite the rise of cloud computing. To modernize on-prem storage for hybrid storage paradigms, we continue to enhance high performance, high density, modern object storage on prem, powered by Apache Ozone, for vastly greater scalability at lower cost to service the voracious data consumption needs of modern data workloads. This release supports improved high availability, snapshots, user quotas, and wider integrations.
- Upgrading to the next version of your data platform is one of life’s greatest joys…said no one ever. This is why this release is our next long-term supported (LTS) release, and will free you of the need to perform any major upgrades for years to come. Learn more about our LTS release mantra here. As an LTS release, it is designed with stability in mind and is cumulatively built with the innovations of all previous releases, meaning you can safely continue your existing workloads, as well as park them here for the long haul.
- Whether you’re upgrading from a recent version or migrating from an older platform, getting to this release is easier than any previous release. We’ve dedicated our efforts to provide you with a suite of automation tools and services for a simpler upgrade experience. Our unwavering commitment to easier upgrades and high availability shines even brighter once you’re on this version with the introduction of our Zero Downtime Upgrade (ZDU) methodology for future releases. We’ll cover more on ZDU in an upcoming blog.
We’re always humbled to see the cutting-edge use cases and innovative business solutions that our customers continue to build on CDP. With this release, you can accelerate the development of your data workloads to solve your hairiest challenges.
If you’re considering building innovative AI applications, but are concerned with how SaaS LLMs use your commercially sensitive data to fine-tune for enterprise context, consider using open source LLMs such as Llama 2, Falcon, or Platypus 2 to keep your data securely on prem and retain ownership of your model. Or if you’re concerned about running your LLM models and inferences in the public cloud due to high costs, you can take comfort that CDP enables you to fully leverage the inherent privacy and security of your data center to integrate these open-source models with your on-premises data ecosystem at predictable costs. Here are some powerful generative AI use cases that our customers are running on premise on CDP today:
- Document summarizers: Use your rich enterprise data to build context-specific AI applications that can summarize documents automatically, speeding up manual workflows.
- Customer sentiment analysis: Analyze customer feedback to gain insights into their opinions and preferences automatically.
- Predictive maintenance for complex machinery: Use AI to predict when machinery is likely to fail, so that you can perform maintenance proactively and avoid costly downtime.
- Code completion optimizers: Use AI to optimize code completion, making it faster and more accurate.
- Fraud detection and prevention: Leverage the power of the open data lakehouse to monitor transactions in real time and not just detect but prevent fraud.
With a growing set of customer use cases spanning the entire data lifecycle, the possibilities are truly endless. We’re excited to see the innovative new use cases that our customers—you—will build on Cloudera for private cloud and the value these will unlock for your organization.
What’s next?
If you would like to learn more about the release and what it contains, have a look at the release summary. If you are rearing to go and start your upgrade right now, you’ll find all the details for just that here.
Lastly, here’s some additional resources you may find useful:
- Learn more about Cloudera’s Open Data Lakehouse
- Learn more about Enterprise AI with Cloudera
- Hear more about this release at Cloudera Now
*Source: IDC Cloud Data Management Survey, 2021 and IDC Global DataSphere 2023