A Closer Look at The Next Phase of Cloudera’s Hybrid Data Lakehouse

A Closer Look at The Next Phase of Cloudera’s Hybrid Data Lakehouse

Artificial Intelligence (AI) is primed to reshape the way just about every business operates. Cloudera research projected that more than one third (36%) of organizations in the U.S. are in the early stages of exploring the potential for AI implementation. But even with its rise, AI is still a struggle for some enterprises. AI, and any analytics for that matter, are only as good as the data upon which they are based. And that’s where the rub is. Struggling to access and collect, oftentimes disparate and siloed, data across environments that are required to power AI, many organizations are unable to achieve the business insight and value they had hoped for. Faced with unique challenges around distributed data infrastructures, governance, and an evolving security landscape, enterprises need the right support to fully tap into AI quickly.  

To power our customers’ data, AI, and analytics needs, we are unveiling the next phase of our open data lakehouse, featuring several enhancements built to quickly scale enterprise AI and deliver unprecedented business value. Cloudera is now the only provider to offer an open data lakehouse with Apache Iceberg for cloud and on-premises. This marks a significant milestone for the platform: according to IDC, today about half of the world’s enterprise production data under management is on-prem. The latest release of the Cloudera platform delivers a one-of-a-kind set of capabilities to bring the same open data lakehouse functionality from the cloud into those data centers. The platform is ready to address the complexities of managing highly sensitive, yet critical, company data while still extracting the most value from its use. 

Let’s dive deeper into three of the most impactful features included in this update. 

Apache Iceberg

The addition of Apache Iceberg support for the Cloudera platform unlocks opportunities for enterprises to apply mission-critical data to AI and address some of the most error-prone processes, enabling them to generate new use cases, improve overall performance, and reduce costs. Iceberg delivers the open table format so that enterprises can put AI to work on their data all in an on-premises setting. This approach brings new compute engines into the fold, adding Spark, Flink, Impala, and NiFi, enabling concurrent access and processing of datasets within Iceberg.

With built-in features like time travel, schema evolution, and streamlined data discovery, Iceberg empowers data teams to enhance data lake management while upholding data integrity. Things like in-place schema evolution and ACID transactions on the data lakehouse are critical pieces for organizations as they push to achieve regulatory compliance and adhere to policies like the General Data Protection Regulation (GDPR). The powerful platform data security and governance layer, Shared Data Experience (SDX), is a fundamental part of the open data lakehouse, in the data center just as it is in the cloud.  

Apache Ozone

As AI and other advanced analytics continue to grow in scale, performance and scalable data storage will need to expand right along with them. Specifically for the data center, Apache Ozone delivers greater scalability, at a lower cost, helping organizations drive greater business value. With the Cloudera platform’s latest update, new features give customers the tools they need to incorporate greater security and strengthen enterprise readiness. The latest generation of our platform includes Ozone features like improved replication, improved quotas for volumes, buckets to facilitate cloud-native architectures, and snapshots, which are also now able to support data storage at the bucket and volume levels.

Zero Downtime Upgrades

Beyond improvements to Iceberg and Ozone, the platform now boasts Zero Downtime Upgrades (ZDU). ZDU gives organizations a more convenient means of upgrading. Rolling upgrades are now supported for HDFS, Hive, HBase, Kudu, Kafka, Ranger, YARN, and Ranger KMS.  ZDU ensures customers experience minimal workflow disruptions and ultimately reduce or even eliminate lengthy and costly downtimes.

By adding ZDU, customers get a powerful boost to productivity with capabilities like one-stage upgrades and auto upgrades of large clusters. And for the platform components that are still expected to experience downtime, this update ensures they are optimized through Cloudera Manager and able to quickly restart. This marks a key improvement to previous iterations where some of the services, like Queue Manager, were often the first pieces to go down and some of the last ones to restart. Those services are now able to get back up and running in a matter of minutes, right at the start of the ZDU.

AI is quickly cementing itself as a key part of generating maximum business value out of enterprise data. Getting to that value though, means utilizing data and analytics in the environment that they are most well-suited to run—that’s what makes a hybrid approach so crucial. And that’s also what makes Cloudera so unique. The Cloudera platform offers portable, cloud-native, analytics that can be deployed across infrastructures, all while maintaining consistent data governance and security. Available for cloud and now also for the data center.

Learn more about the next generation of Cloudera Data Platform for Private Cloud. 

Wim Stoop
Director Product Marketing
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.