When data is immutable, append-only, and tagged by timestamp, there is no confli...

psykotic · on March 5, 2012

> In this case where changes come in at two times (which are subject to all the raciness of the real world that exists regardless), one will win.

That describes many methods of optimistic concurrency control, but it doesn't answer my question of how this supposed to work in practice with high write contention, the higher latency of a distributed peer model, the long-running transactions the video mentions (or maybe that remark only applied to long-running queries), etc. My point being, if the distributed transaction problem was easily solved by sprinkling on optimistic multi-versioning concurrency control, it would have been solved a long time ago. There must be some special sauce they're not mentioning.

stevelosh · on March 5, 2012

From the FAQ:

    Thus, Datomic is well suited for applications that require write consistency and read scalability.

Seems like they're not focusing on high-write situations.

lukev · on March 5, 2012

Correct, it's not write-scalable in the same way it is read-scalable. The transactor is a bottleneck for writes.

However, that doesn't mean it has slow writes - it should still do writes at least on a par with any traditional transactional database, and probably a good deal faster since it's append-only.

stevelosh · on March 5, 2012

I'm more concerned with what happens when the transactor goes down, or gets silently partitioned from some of the clients. I assume reads will continue to work but all writes will break?

I'd also like to know more about how the app-side caching works. If I've got a terabyte of User records and want to query for all users of a certain type, does a terabyte of data get sent over the wire, cached, and queried locally? Only the fields I ask for? Something else?

lukev · on March 5, 2012

1. You're correct, however, the architecture does allow you to run a hot backup for fast failover.

2. The database is oriented around 'datoms', which are an entity/attribute/value/time. Each of these has its own (hierarchical) indexes, so you only end up pulling the index segments you need to fulfill a given query. You'd only pull 1TB if your query actually encompassed all the data you had.

edwardw · on March 5, 2012

I noticed that in FAQ, too. Since read is relatively easy to scale (simple master / slave setup), I wonder how to scale datomic on write side.