One thing I've never understood is why all the indexes have transaction *last*. ...

refset · on July 24, 2020

I'm not sure EATV/AETV could be used fully instead of EAVT/AEVT as you would then lose the ability to have efficient range seeks across values. I do agree though that scanning all historical values in EAVT/AEVT is unsatisfactory for many use-cases as it makes the performance of ad-hoc as-of queries unpredictable.

By contrast, Crux [0] uses two dedicated temporal indexes: EVtTtC and EZC (Z-curve index) to make bitemporal as-of queries as fast as possible. These are distinct from the various triple indexes, which don't concern themselves with time at all. (Vt = valid time, Tt= transaction time, and C = the document hash for the version of an entity at a given coordinate)

[0] https://opencrux.com (I work on Crux :)

lukashrb · on July 24, 2020

Not necessarily. Take a look at this implementation https://aosabook.org/en/500L/an-archaeology-inspired-databas...

There you can only retrieve the top layer and don't have to scan all historic data, it's only in-memory though.

jasonwatkinspdx · on July 24, 2020

In the article it mentions that while indexes are conceptually monolithic, in practice they're partitioned into 3 spaces: historical, current, and in memory.

New data gets written to the log for durability and updates the in memory portion for queries. Periodically indexes are rebuilt, creating new segments for current, and shifting historical data out of current. This limits how much of the log must be replayed on recovery, and allows garbage collection of data that falls out of the retention window.

It's not that dissimilar to solutions used by traditional mvcc databases.

nlitened · on July 24, 2020

The page mentions the Log index which is sorted by transaction id. It should be enough to support as-of if I understand correctly.

paulgb · on July 24, 2020

The log index supports as-of if you know the actual transaction ID, but if you want to look up by entity/attribute efficiently it's not much help because you don't know when the data point you're interested in was last modified.

nlitened · on July 25, 2020

I think in this case you’d find all datoms via normal EAVT index and then sort the results by transaction id, dropping everything after your desired transaction.