Hacker News new | past | comments | ask | show | jobs | submit login

One thing I've never understood is why all the indexes have transaction last. One of the selling points of Datomic is that it supports as-of queries, but using the EAVT or AEVT indexes requires it to scan all historic values of that attribute, right?

In most situations this is probably fine, but if you have data that changes frequently it seems like this could slow queries down compared to an EATV or AETV index.

It's also likely that the people who made Datomic are both smarter about this stuff than me and put more thought into it than I have, so I'd love to know what the reasoning behind the choice of index is.

(PS @dang it would be nice to have (2014) in the title)




I'm not sure EATV/AETV could be used fully instead of EAVT/AEVT as you would then lose the ability to have efficient range seeks across values. I do agree though that scanning all historical values in EAVT/AEVT is unsatisfactory for many use-cases as it makes the performance of ad-hoc as-of queries unpredictable.

By contrast, Crux [0] uses two dedicated temporal indexes: EVtTtC and EZC (Z-curve index) to make bitemporal as-of queries as fast as possible. These are distinct from the various triple indexes, which don't concern themselves with time at all. (Vt = valid time, Tt= transaction time, and C = the document hash for the version of an entity at a given coordinate)

[0] https://opencrux.com (I work on Crux :)


Not necessarily. Take a look at this implementation https://aosabook.org/en/500L/an-archaeology-inspired-databas...

There you can only retrieve the top layer and don't have to scan all historic data, it's only in-memory though.


In the article it mentions that while indexes are conceptually monolithic, in practice they're partitioned into 3 spaces: historical, current, and in memory.

New data gets written to the log for durability and updates the in memory portion for queries. Periodically indexes are rebuilt, creating new segments for current, and shifting historical data out of current. This limits how much of the log must be replayed on recovery, and allows garbage collection of data that falls out of the retention window.

It's not that dissimilar to solutions used by traditional mvcc databases.


The page mentions the Log index which is sorted by transaction id. It should be enough to support as-of if I understand correctly.


The log index supports as-of if you know the actual transaction ID, but if you want to look up by entity/attribute efficiently it's not much help because you don't know when the data point you're interested in was last modified.


I think in this case you’d find all datoms via normal EAVT index and then sort the results by transaction id, dropping everything after your desired transaction.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: