Databricks to Buy Data-Management Startup Tabular

jonnat · 2024-06-04T19:22:15 1717528935

So, will Databricks deprioritize Delta Lake in favor or Iceberg, or will they try to derail Iceberg development now they they got the team that originally built it at Netflix?

Edit: from the Tabular CEO announcement

  Databricks reached out to me and proposed a collaboration that could bring Iceberg and Delta closer together [...] I’m excited to have the opportunity to work with Databricks and the broader Delta community to build better table formats together.

So it seems they are going for the latter.

indoordin0saur · 2024-06-04T20:02:06 1717531326

Doesn't it sound like they'll try and move the two formats closer together so that there isn't such a format war? IDK how it would benefit Databricks to ruin either format if they're now such huge stakeholders in them both.

Either way, I just want to know which format to pick. I've been chief data engineer at my current company for about a year and would like to be able to move off of plain parquet files in my lake but I'm not sure what table format to choose.

waterlx · 2024-06-05T09:14:14 1717578854

Hi, in case you did not find the answer yet. In my hamble opinion: - choose Iceberg: If you have several computing/query engines other than Spark, like Presto, Flink. Iceberg has a great extraction and design for a engine-independent table format. But its learning cost is relative high - choose Delta: If you only have Spark and would like to be deeply binded with Databricks - choose Hudi: If you would like to use data lake out-of-the-box and it is quite easy to use. - If your data is updated frequently, like streaming, check https://paimon.apache.org/ if you would like to be deeply binded with Flink

indoordin0saur · 2024-06-05T13:35:39 1717594539

Thank you! Sounds like iceberg is the best then. I'm very allergic to lock-in. Currently we're very Spark heavy and our query engine is AWS Redshift Serverless. The recent AWS Glue Catalog support for Iceberg seems to make this promising.

ruipds · 2024-06-07T15:46:43 1717775203

I heard from a AWS worker that they consider Iceberg to be the future. A lot of their services will be glued together with it.

TheCleric · 2024-06-12T20:49:11 1718225351

At the Databricks Summit keynote this morning they pitched it as a way of trying to standardize and bridge across the two more easily, with neither going away.

datadrivenangel · 2024-06-04T19:38:26 1717529906

So Delta will eventually become better but not before Databricks gets a year or three of advantage over iceberg and open source delta.

artwr · 2024-06-04T20:55:12 1717534512

Sounds like a really good move by Databricks, in particular because a lot of the main platforms had implementations of catalogs to the Iceberg Spec, and several vendors, Snowflake included was starting to support Iceberg as an external Table format.

I have similar questions about the future of Delta Lake, but not really about the future of Iceberg, that's what the Apache Foundation is for after all. There are enough large enterprise players relying on this (Apple, Netflix, ...) to keep the project going for a while.

djoldman · 2024-06-04T16:33:35 1717518815

Tabular statement:

https://tabular.io/blog/tabular-is-joining-databricks/

Centigonal · 2024-06-04T19:12:07 1717528327

Here is Databricks's as well:

https://www.databricks.com/blog/databricks-tabular

Pils · 2024-06-04T17:55:53 1717523753

Seems bad for Snowflake? Iceberg is a big part of Snowflake's data lake offering, and I assumed it was a Snowflake-originated OSS project until this announcement (all Snowflake products have snow related names).

chimerasaurus · 2024-06-04T18:06:44 1717524404

Disclaimer - I am James on this[1] blog.

Yesterday we announced Polaris specifically so (1) customers don't get locked into a catalog; (2) people know Snowflake works with AWS, Azure, Confluent, etc.

1: https://www.snowflake.com/blog/introducing-polaris-catalog/

seahckr · 2024-06-04T19:35:01 1717529701

This [1] says Snowflake was also bidding to buy Tabular.

[1] https://www.cnbc.com/2024/06/04/databricks-is-buying-data-op...

zwaps · 2024-06-04T19:05:40 1717527940

Awfully many coming soons in that article

chimerasaurus · 2024-06-04T19:11:30 1717528290

No doubt. Ask my team how thrilled I am whenever we say "coming soon."

Narrator: It made him die inside.

jetru · 2024-06-04T18:23:34 1717525414

Iceberg is not Snowflake originated, it was built by folks at Netflix - the same folks who built Tabular.

But yes, this is definitely bad for Snowflake, Databricks can position itself as a very strong competitor with this move and moving more towards Iceberg.

indoordin0saur · 2024-06-04T19:59:22 1717531162

So this makes it sound like a positive thing for Iceberg as a format. Others seem to be suggesting that DataBricks will be working to undermine the format in favor of their older Delta Lake format but that seems overly cynical.

jetru · 2024-06-04T20:26:14 1717532774

Yea, in theory, the way its stated, it would be a plus for the industry and Iceberg consolidation.

How it will actually play out, who knows?

indoordin0saur · 2024-06-04T20:50:35 1717534235

My guess is Data Bricks saw the popularity of Iceberg and realized that they were starting to look a little irrelevant still trying to promote their competing Delta Lake format. Have an Iceberg lakehouse? Well then Databricks just didn't seem very relevant to you. With their purchase of Tabular they're given some legitimacy when they start marketing their products as being iceberg compatible. This doesn't signal to me that Iceberg is going to be harmed in the near or medium term.

jamesblonde · 2024-06-04T20:04:46 1717531486

I was at MySQL when Oracle bought lowly Innobase (3 people), but it was the open-source tech that MySQL built on. MySQL subsequently tried to build their own transaction engine (Falcon), but failed. This deal feels like Oracle = Databricks, Snowflake = MySQL

The big difference - innodb got like 3m, Tabular 1bn!

ec109685 · 2024-06-05T07:13:07 1717571587

9 months ago! https://tabular.io/blog/the-case-for-independent-storage/

enether · 2024-06-05T18:19:35 1717611575

money speaks louder than words

it was an amazing sales pitch though