Leveraging the lightcone around the source of truth with Postgres

jimkoen · 2024-11-12T05:16:51 1731388611

> The more logic happens near the db, the more surface for errors and edge cases can be reduced and reliability issues avoided. The reduction in distances may also result in improved abstractions and in making the computations efficient in a fundamental thermodynamic way.

Fundamentally this sounds great, but isn't this rather painful with Postgres?

In my DB lectures in Uni I've been informally told to avoid managing logic from within Postgres, as the development and debugging experience for stored procedures is rather poor. I would add to this the painfully slow development cycles for third party language implementations such as plv8. In addition, platform support is next to none, because language extensions cannot load external code due to the trusted nature of the execution environment.

lmm · 2024-11-12T05:48:49 1731390529

Yep. He's got half of the right idea - move the code to the data, process it where it is - but tied to the most awful way of storing the data. The right way is what today's stream-processing frameworks tend to do - the heavyweight global coordination piece is used only for the part that actually needs to be coordinated (temporal ordering, and only the partial form that's required), the bulk data mostly lives in a leveldb or similar embedded into the application, which can process it directly where it is.

mfcl · 2024-11-12T05:22:16 1731388936

It says near the db, not necessarily in it.

The way I understand it, the closer the better for the reasons stated, but that's just a factor and there are others that might make you not want to stay too close for certain logic.

javajosh · 2024-11-12T06:43:38 1731393818

>The fact that persistent storage has to happen at the same place to be reliable is only incidental, not a defining attribute of a relational database.

Persistence happens to processes who's state is expensive to reconstruct. Measurements of the world requires time travel to recover and so are quite expensive to acquire! "Incidental" implies a weaker correlation than warranted.

As an aside, ~100% of data models are incapable of modelling contradictary or ambiguous measurements and fail to adequately model alternative normalization and integration into whatever model of "truth" you pick. And these systems entirely fall over when any part of these tacit and underspecified constraints change.

BenoitEssiambre · 2024-11-15T05:17:19 1731647839

I definitely agree that models are difficult and contextual. I do think that minimizing distances and entropy does reduce the difficulty of doing migrations and refactoring of the data model. Doing migrations across spread out, distributed data is much more difficult.