In the case of this analyst, say every couple of days he is writing SELECT \* FR...

goto11 · on May 14, 2023

> Normalisation is there to make data accessible for multiple different users

Normalization is there to avoid anomalies which is another word for data corruption. If you have the same datum repeated multiple times in the base tables, any update can (and probably will, due to Murphys law) lead to inconsistencies, which mean the database is not a reliable source of information anymore. How do you quantify the cost of that?

> looking for opportunities to avoid writing the same JOIN in 20 different queries.

Then you define a view, with is literally just a named and reusable query, which can be used in other queries. Writing queries or using views is certainly not "denormalization". Having redundant data in a query or view output is commonplace and not a problem since it cannot introduce update-anomalies. (Some databases allow updateable views in some scenarios, but only when the view can be unambiguously mapped back to base tables, so no risk of update-anomalies here either.)

datathrow0007 · on May 13, 2023

Use a view that acts as a quick-and-useful abstraction to mimic a denormalized table?

E.g.

``` CREATE VIEW vw_events_and_projects AS SELECT * FROM events JOIN projects ```

Then

``` SELECT * FROM vw_events_and_projects ```

Edit:

And if you need OLAP, replicate the normalized table to a database that handles analytics workflows better (e.g. ClickHouse).

Then you get the normalized forms for your OLTP workflows (your "bread and butter"); and you also get the efficiency and ergonomics of real-deal OLAP.

Of course, your biggest issue is going to be keeping the two in-sync. Obvious case is to have your OLTP database stream synchronization data to the replica whenever data is modified.

roenxi · on May 13, 2023

You'd like DBT, you should go and read up on it. It uses views.

cm2187 · on May 13, 2023

Plus there are many cases where you want to see the data as of a given time, for instance all the positions of a fund at a given time. Then it makes sense to denormalise because this data should never be updated in the future.

im3w1l · on May 13, 2023

What about using logs as the source of truth? It's a pattern I've seen multiple times.