Coming to Postgres, UNDO logs and no vacuum https://github.com/orioledb/

avinassh · on April 28, 2023

what is an UNDO log and how does it solve the problem?

fdr · on April 28, 2023

In most cases[1], when you update a tuple in Postgres, a new tuple is put somewhere else in the same heap, with different visibility information, "xmin", "xmax". The old tuple remains where it is. Index pointers to it likewise remain unchanged, but a new entry is added for the new tuple. The old version gains an updated "xmax" field indicating that version was deleted at a certain logical point.

Later on, VACUUM has to plow through everything and check the oldest running transaction to see whether the tuple can be "frozen" (old enough to be seen by every transaction, and not yet deleted) or the space reclaimed as usable (deleted and visible to nothing). Index tuples likewise must be pruned at this time.

In systems with an UNDO log, the tuple is mutated in place and the contents of the old version placed into a sequential structure. In the case where the transaction commits, and no existing concurrent repeatable read level transactions exist, the old version in the sequential structure can be freed, rather than forcing the system to fish around doing garbage collection at some later time to obsolete the data. This could be considered "optimized for commit" instead of the memorable "optimized for rollback."

On the read side, however, you need special code to fish around in UNDO (since the copy in the heap is uncommitted data at least momentarily) and ROLLBACK needs to apply the UNDO material back to the heap. Postgres gets to avoid all that, at the cost of VACUUM.

[1] The exception is "HOT" (heap only tuple) chains, which if you squint look a tiny bit UNDO-y. https://www.cybertec-postgresql.com/en/hot-updates-in-postgr...