Is there really a 1-1 mapping between SQL and T? What about use cases where the ...

ibains · on Oct 8, 2021

We work with many businesses that are larger (Fortune 500) and the T per pipeline is say 60 steps with 1200 columns at 10TB scale and uses multiple things not in SQL. They lookup object stores, lookup web services, use rocksdb, partitioning is important. At scale, cost becomes critical- some are even moving to their own Spark on Kubernetes. ML on done on data after ETL into Data Lake.

None of them can use DBT for core ETL, but DBT might be good later for views, some dimensional modeling. They have done a good job here.

Think of it as the modern small-scale data stack.

verdverm · on Oct 8, 2021

Have you explored Cuelang for T?

verdverm · on Oct 11, 2021

I got inspired and started this over the weekend to demonstrate what is possible.

https://github.com/hofstadter-io/cuetils

rpedela · on Oct 9, 2021

If you can write a SQL query or a set of SQL queries to do your transformation, then you can use DBT. DBT doesn't do transformation itself rather it helps you manage all the dependencies between your SQL models. Whether you can use SQL depends on your data and database/warehouse functionality. For example, JSON parsing support is pretty good now in many databases and warehouses. If your objects can be represented as JSON, then you could write SQL via DBT to parse the objects into columns and tables.

Arimbr · on Oct 8, 2021

My understanding of dbt is that it builds a DAG based on the interdepencies between models. The interdepencies are parsed from 'ref' functions on the SQL files. The thing with dbt is that you transform the data within a single data warehouse.

So, you would normally first load all data to the data warehouse. Then dependencies between SQL models are easier to map.