We work with many businesses that are larger (Fortune 500) and the T per pipelin... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

ibains on Oct 8, 2021 | parent | context | favorite | on: ETL Pipelines with Airflow: The Good, the Bad and ...

We work with many businesses that are larger (Fortune 500) and the T per pipeline is say 60 steps with 1200 columns at 10TB scale and uses multiple things not in SQL. They lookup object stores, lookup web services, use rocksdb, partitioning is important. At scale, cost becomes critical- some are even moving to their own Spark on Kubernetes. ML on done on data after ETL into Data Lake.

None of them can use DBT for core ETL, but DBT might be good later for views, some dimensional modeling. They have done a good job here.

Think of it as the modern small-scale data stack.

verdverm on Oct 8, 2021 [–]

Have you explored Cuelang for T?

verdverm on Oct 11, 2021 | [–]

I got inspired and started this over the weekend to demonstrate what is possible.

https://github.com/hofstadter-io/cuetils

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact