Hacker News new | past | comments | ask | show | jobs | submit login

Dynamo, Riak, Cassandra



so this is more complicated than your schema ? :

http://wiki.basho.com/Replication.html#Read-Repair

"Read repair occurs when a successful read occurs — that is, the quorum was met — but not all replicas from which the object was requested agreed on the value. There are two possibilities here for the errant nodes:

1. The node responded with a “not found” for the object, meaning it doesn’t have a copy.

2. The node responded with a vector clock that is an ancestor of the vector clock of the successful read.

When this situation occurs, Riak will force the errant nodes to update their object values based on the value of the successful read."


Yes. First of all, not every algorithm is amenable to read-repair. Imagine, for example, storing a unique count in the database. There's no way to know how to combine divergent values in that case. (If the root value is 4, and you have two divergent values of 5, you have no idea if the increment was due to the same element or not. The right answer is either 5 or 6, but you have no idea).

More importantly, if you make a mistake, you corrupt the database. The system I described based on immutable data is human fault-tolerant, which is a critical property. If you mess up, you can always correct things.


>Imagine, for example, storing a unique count in the database. There's no way to know how to combine divergent values in that case. (If the root value is 4, and you have two divergent values of 5, you have no idea if the increment was due to the same element or not. The right answer is either 5 or 6, but you have no idea).

if 2 nodes are allowed to accept writes for the same "cell" independently without synchronization, ie. node A : 4->5, node B : 4->5->6 how your schema would work in this case? (of course any schema would work fine if only one node allowed to master the "cell" )


I think the point here is that nodes don't accept these random writes; any error that's introduced into a system with this structure is fixed on recompute.


You use the timestamp to resolve the two values to see whether 5 or 6 is the latest.

Your system would have the same problem to resolve which data is the latest.


In his system you neither partition would have written "5" or "6". Rather, the one on the left would have written a "+1", and the one on the right would have written a "+1", and you can tell whether these are the same "+1" or not. You only combine them when you do the query.


One is +1 (to 5) and the other one is +2 (to 6). Which one is the correct one?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: