In such a system, how does a reader find the root node? I'd be concerned about a...

ww520 · on Dec 29, 2023

The pointer to the root tree node is stored in the last committed metadata page. A read transaction starts with the reading of the metadata page. Reaching the partial written pages is impossible as the writer has not committed the latest metadata page yet. The transaction is committed when the latest metadata page is written as the last step.

hyc_symas · on Jan 5, 2024

A read txn starts with reading the last page of the file and searching backward for a valid metapage. This can be time-consuming if the previous use crashed while a large write was in progress. Yet another reason LMDB doesn't use append-only design.

twoodfin · on Dec 29, 2023

There’s a “canonical” pointer to the root node which is updated atomically on every append.

For an in-memory database, a CAS is adequate. Persistent stores ultimately need some kind of file-level locking.

If you look at the Apache Iceberg spec, you get a good idea of how this works: The only “mutability” in that universe is the root table pointer in the catalog.