Hacker News new | past | comments | ask | show | jobs | submit login

Are you perhaps talking about something like https://materialize.com/ ? (btw, dbt now has some materialize compatibility)

Maybe Pravega and Beam working together? https://pravega.io/docs/v0.6.0/key-features/

Another option is something like Snowflake with tasks and streams. https://docs.snowflake.com/en/user-guide/tasks-intro.html

Or Snowflake with change streams, dbt and scheduler in combination with lambda views. https://discourse.getdbt.com/t/how-to-create-near-real-time-...




>2. Run dbt in micro-batches

>Just don’t do it. Because dbt is primarily designed for batch-based data processing, you should not schedule your dbt jobs to run continuously. This can open the door to unforeseeable bugs.

why not though. you can inplement incremental models and run them continously. sure its more work but what bugs does this cause?


Totally agree. While not using DBT specifically, I've done this on tables with billions of rows and it works perfectly. And even this can be combined with a Lambda view giving you the best of both worlds. Combining overcomes any latency from the incremental process since it can take time.

But I did end up questioning why I needed to continuously microbatch when the lambda views are able to bridge the gap. It turned out that the lambda views were good enough that we could reduce the microbatching back to ever 24hrs, and that was just being overly cautious. 48hrs or more might have been good enough, maybe more.

It turned out that the costly part of the microbatching was really merging (inserts and update, not append only) the delta data back into the prepared table. Selecting, combining and consolidating new and historic data is extremely fast.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: