Hacker News new | past | comments | ask | show | jobs | submit login

We work with many businesses that are larger (Fortune 500) and the T per pipeline is say 60 steps with 1200 columns at 10TB scale and uses multiple things not in SQL. They lookup object stores, lookup web services, use rocksdb, partitioning is important. At scale, cost becomes critical- some are even moving to their own Spark on Kubernetes. ML on done on data after ETL into Data Lake.

None of them can use DBT for core ETL, but DBT might be good later for views, some dimensional modeling. They have done a good job here.

Think of it as the modern small-scale data stack.




Have you explored Cuelang for T?


I got inspired and started this over the weekend to demonstrate what is possible.

https://github.com/hofstadter-io/cuetils




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: