Hacker News new | past | comments | ask | show | jobs | submit login

That part is clear to me. What's not immediately obvious is that the DT in a CRDT can be entire database. Certainly the focus seems to be on individual data structures within a data store.

Might have to re-read the papers with this thought in mind. It did seem like there was a duality there but I kind of dismissed it.




A CRDT is still just a data structure. Most of my work lately has been on building an operation based CRDT for lists. Storage wise, you basically just store the set of all the operations you know about. (Though you can trim that set down if you want, like a shallow git clone).

You can store that set of operations just as easily in a file or in a database table. (Though the table will take a bit more space on disk because you can't be quite as clever).

To generate the data value itself, you can either replay all the operations from scratch (which is not as bad as it sounds), or store a cached copy of the data at some version and use that. You can think about it as, the operations give you history, tell you how to merge in remote changes and contain the data remote peers need to sync. The data itself is the value you read out & display.

There's nothing special about the data itself. You can store all that data in a flat file, or a database table, or in a few arrays in memory. The fancy part is (usually) how you merge changes together.


The conditions referenced in GP seem to imply that the most general state-based CRDT is simply a set of all reported outcomes. Something commutative, associative and idempotent is basically describing set union. So whether it's easy to go from one to the other would seem to depend on what constraints you're enforcing on your database.

Operation-based CRDT's are a bit different, these require commutativity and associativity. So you would be dealing with a multiset of reported operations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: