> Take real-time products, for example. Most businesses have little use for true real-time experiences. But, all else being equal, real-time data is better than latent data.
Only if you stare at a chart on a dashboard. If you do serious analytics and collaborate with other people you need the data to be stable and not changing under your hands as you work with it.
Daily immutable snapshots produced by a daily batch are not only cheaper to produce but easier to work with and that's why the entire industry prefers batch.
The only use case for streaming are 'standing queries' in which query result is continuously updated with new rows incoming. Eg. real time anomaly detection in access logs, ML inferrence, fraud detection, plotting numbers on a dashboard.
All analysis is done offline in a batch setup. There is no 'stream analytics'. Stream processing only makes sense if the database is capable of hosting serving layer of the model inferrence application.
>If you do serious analytics and collaborate with other people you need the data to be stable and not changing under your hands as you work with it.
Apache Iceberg and Delta Lake take care of that. The fact that you're reading a particular snapshot of data does not prevent you from adding, updating or deleting data in the meantime.
Also, your view is super narrow and focuses only on traditional analytics.
I think this is super narrow view of the world that applies to a lot of things but not everything. Any data use that involves reactionary decision making will benefit from real time streaming data. Financial services is replete with this, but so are monitoring applications, traffic analytics, advertising, etc. A lot of my career has been turning batch into real time and the commercials were there.
Even then, there are specific forms of fraud for which streaming detection makes sense. If your upstream data is batch, there is little good reason to implement a streaming approach.
Only if you stare at a chart on a dashboard. If you do serious analytics and collaborate with other people you need the data to be stable and not changing under your hands as you work with it.
Daily immutable snapshots produced by a daily batch are not only cheaper to produce but easier to work with and that's why the entire industry prefers batch.
The only use case for streaming are 'standing queries' in which query result is continuously updated with new rows incoming. Eg. real time anomaly detection in access logs, ML inferrence, fraud detection, plotting numbers on a dashboard.
All analysis is done offline in a batch setup. There is no 'stream analytics'. Stream processing only makes sense if the database is capable of hosting serving layer of the model inferrence application.