Databricks Follows Cloudera by Adopting Iceberg, While Snowflake Mulls Open Source Approach

Databricks Follows Cloudera by Adopting Iceberg, While Snowflake Mulls Open Source Approach

A constant flow of breaking news from the data lakehouse space is making notable tech headlines this week.

On Tuesday, Databricks announced that it will acquire Tabular, a data management company founded by the creators of Apache Iceberg, Ryan Blue, Daniel Weeks, and Jason Reidfor. The deal was for an unconfirmed sum, but some reports suggest that amount to be between $1B and $2B (and allegedly outbidding Snowflake). The move aims to unify the two most popular open-souce lakehouse formats — Apache Iceberg and Linux Foundation Delta Lake — to enhance data compatibility across different formats.

The prior day, Snowflake – still dealing with the aftermath of last week’s data breach – announced Polaris Catalog, a vendor-neutral, open catalog for Apache Iceberg. The company also announced at its annual user conference that Polaris Catalog will be open sourced in the next 90 days.

So, how do you make sense of all these announcements and what does this mean to you? 

Iceberg is the Champion in the Table Format War

Databricks putting this much value in Iceberg is proof that Delta Lake has lost the table format war, and Iceberg is the clear winner. Iceberg will further become, and will remain, the de facto standard for large-scale data and analytics deployments for the long run. 

Cloudera was a first mover in adopting Iceberg as central and native to our data, analytics, and AI platform – reinforcing our credibility as the best vendor to work with when you want managed Iceberg data estates, at scale, across all clouds and on-premises. 

How Open is Your Open Source?

Despite its claims as the open data lakehouse company, Databricks is NOT well known for being true to open source. Unlike Tabular, Databricks has made commercial versions as proprietary implementations of open source technology in a bid to retain customer lock-in, and it will remain to be seen if this move changes that approach. 

Cloudera is a neutral party that manages Iceberg without vendor lock-in and at scale – in all clouds and on-premises. Cloudera also counts as customers many of the other large organizations that directly contribute to the project.  That’s truly open source.

Tabular Does Not Own Iceberg

Tabular was founded by the originators of the Iceberg project. The company has about 20% of the Iceberg contributors and committers on staff (companies like AWS, Google, Dremio, Starburst, Adobe, Apple, Netflix, and more), which make up the bulk of the contributions. It has a healthy community, unlike Delta Lake, and a lot of big tech companies who are invested in keeping it open source and vendor independent.

This is a risky and costly acquisition by Databricks, particularly if the 80% of the committers decide that other committer affiliations weaken the mission to remain open source for all.

Welcome to the Party

Cloudera has been ahead of this game for years. Our 2022 open lakehouse position blog post was essentially the blueprint for the Databricks acquisition announcement

Iceberg has, and continues to be, central to Cloudera’s open data lakehouse architecture across hybrid clouds – not just something to be used on the side. Databricks failed to gain adoption for Delta Lake from communities and third-party vendors, and now must make this BIG and costly bet. At the same time, Snowflake’s Polaris catalog timing shows that they’ve been forced into this space as the market and customers have moved Iceberg as the central table format for their data two years after Cloudera.

They are both not only late to join the party, but will miss the fun–and opportunity–as they play catch up to those of us who have been here from the start. 

Venkat Rajaji
SVP, Product Management
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.