Hacker News new | past | comments | ask | show | jobs | submit login

CRDTs don’t quite fit in this list, as these are all valid merge strategies that a CRDT can use. The fundamental idea of a CRDT is that all of these schemes have something in common: They’re based on iteratively merging a bunch of versions together with a merge operator that fulfills a few basic properties (where + means merge:

- Associativity: (A + B) + C == A + (B + C)

- Commutativity: A + B == B + A

- Idempotency: A + A = A

- Closure: Every merge result is a valid merge argument

- (perhaps a few others; it’s been a while since I read the original papers)

As long as your merge operator has all of these properties, you get the strong eventual consistency result that makes CRDTs a useful abstraction: Any two users that have received versions including the same updates will reconstruct the document in the same way, even if the actual set of versions seen differs.




That part is clear to me. What's not immediately obvious is that the DT in a CRDT can be entire database. Certainly the focus seems to be on individual data structures within a data store.

Might have to re-read the papers with this thought in mind. It did seem like there was a duality there but I kind of dismissed it.


A CRDT is still just a data structure. Most of my work lately has been on building an operation based CRDT for lists. Storage wise, you basically just store the set of all the operations you know about. (Though you can trim that set down if you want, like a shallow git clone).

You can store that set of operations just as easily in a file or in a database table. (Though the table will take a bit more space on disk because you can't be quite as clever).

To generate the data value itself, you can either replay all the operations from scratch (which is not as bad as it sounds), or store a cached copy of the data at some version and use that. You can think about it as, the operations give you history, tell you how to merge in remote changes and contain the data remote peers need to sync. The data itself is the value you read out & display.

There's nothing special about the data itself. You can store all that data in a flat file, or a database table, or in a few arrays in memory. The fancy part is (usually) how you merge changes together.


The conditions referenced in GP seem to imply that the most general state-based CRDT is simply a set of all reported outcomes. Something commutative, associative and idempotent is basically describing set union. So whether it's easy to go from one to the other would seem to depend on what constraints you're enforcing on your database.

Operation-based CRDT's are a bit different, these require commutativity and associativity. So you would be dealing with a multiset of reported operations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: