So, will Databricks deprioritize Delta Lake in favor or Iceberg, or will they tr...

indoordin0saur · 2024-06-04T20:02:06 1717531326

Doesn't it sound like they'll try and move the two formats closer together so that there isn't such a format war? IDK how it would benefit Databricks to ruin either format if they're now such huge stakeholders in them both.

Either way, I just want to know which format to pick. I've been chief data engineer at my current company for about a year and would like to be able to move off of plain parquet files in my lake but I'm not sure what table format to choose.

waterlx · 2024-06-05T09:14:14 1717578854

Hi, in case you did not find the answer yet. In my hamble opinion: - choose Iceberg: If you have several computing/query engines other than Spark, like Presto, Flink. Iceberg has a great extraction and design for a engine-independent table format. But its learning cost is relative high - choose Delta: If you only have Spark and would like to be deeply binded with Databricks - choose Hudi: If you would like to use data lake out-of-the-box and it is quite easy to use. - If your data is updated frequently, like streaming, check https://paimon.apache.org/ if you would like to be deeply binded with Flink

indoordin0saur · 2024-06-05T13:35:39 1717594539

Thank you! Sounds like iceberg is the best then. I'm very allergic to lock-in. Currently we're very Spark heavy and our query engine is AWS Redshift Serverless. The recent AWS Glue Catalog support for Iceberg seems to make this promising.

ruipds · 2024-06-07T15:46:43 1717775203

I heard from a AWS worker that they consider Iceberg to be the future. A lot of their services will be glued together with it.

TheCleric · 2024-06-12T20:49:11 1718225351

At the Databricks Summit keynote this morning they pitched it as a way of trying to standardize and bridge across the two more easily, with neither going away.

datadrivenangel · 2024-06-04T19:38:26 1717529906

So Delta will eventually become better but not before Databricks gets a year or three of advantage over iceberg and open source delta.