What you're describing is a query. "What is Sally's current location?" Queries c...

sorenbs · on Oct 13, 2011

So you would still have to deal with the case of someone reading inconsistent data and taking wrong action as a result. If that action is only internal to the system you could go on and do a cleanup when the system becomes consistent again. If the action is external you can not.

nathanmarz · on Oct 13, 2011

Right. If full consistency is a requirement than you can still have it, at the cost of availability. Alternatively, since the dataset is immutable, it contains a history of everything that happened. So you can resolve problematic actions later on (this is similar to what banks do).

It's important to realize that the tradeoff between consistency and availability is a limitation of nature, not of our tooling.

ww520 · on Oct 13, 2011

So CAP is still a problem, not beaten.

kahirsch · on Oct 13, 2011

If I only have one antique pocket watch and, during a partition, Alan buys the pocket watch while communicating with partition A and Bob buys the pocket watch while communicating with partition B, how do we get back to a consistent state without application code?

nathanmarz · on Oct 13, 2011

A good example to look at is banks. Banks are eventually consistent systems. An ATM allows you to withdrawal funds (to a set limit) even if it can't communicate with the bank. However, because banks keep full audit logs of all transactions (immutable data), they eventually discover that you took out too much money and charge you an overdraft fee.

http://en.wikipedia.org/wiki/Overdraft

lemming · on Oct 13, 2011

This isn't really relevant to the pocket watch problem. This only works because the bank doesn't care about over-committing a finite resource (mostly because they can charge you that fee). My company processes transactions for prepaid credit cards, so any money that is overdrawn is essentially lost - it's important to understand the characteristics of the problem and that there really is no magic anti-CAP bullet.

nathanmarz · on Oct 13, 2011

Exactly right. The bank example just demonstrates an alternative approach to full consistency.

nathanmarz · on Oct 13, 2011

This is a use case where it sounds like you want full consistency, so in the realtime layer you would use a database that becomes unavailable during partitions.

kahirsch · on Oct 14, 2011

I think that most database applications cannot be seen as a monotonically-growing collection of facts, but as a sequence of operations, and those operations don't necessarily commute.

Most of the operations do commute, or almost commute--there are edge cases involving, for example, balances or inventory falling to zero, and with side effects, duplicates, generated ids, timestamps, etc. I think it's difficult for these to be handled automatically, because of semantic issues. For side effects, there have to be compensating actions--charging back credit cards for orders not filled, for example, or sending an email saying, sorry, you're not actually getting the watch. For operations that don't commute, having a batch system isn't going to be adequate.

For actions that do commute, I don't see how having a batch system is necessary. It just means having a third opinion of what the value should be. Unless you have a define down-time, you're introducing more consistency issues.

Also, "online" (as in OLTP), rather than "realtime" is more consistent with standard DBMS terminology.

VladRussian · on Oct 13, 2011

the database does it when partition ends. Its called eventual consistency. The database would use something along the lines of vector clocks.

lemming · on Oct 13, 2011

The problem is that in this case the database can only apply some ad-hoc heuristic. In the case of the pocket watch, this will be: the first user to buy it gets the watch, and the second user gets annoyed by an email saying that "the watch we said you bought was actually bought by someone else". There's no magic bullet here - this may be acceptable for some use cases but will not be for others.

VladRussian · on Oct 13, 2011

>Both "Sally lives in Atlanta as of time X" and "Sally lives in Chicago as of time Y" are true.

>This is far, far different from databases based on mutable state. In order for a database like that to be eventually consistent, you need to do read-repair to enforce consistency.

what databases you're talking about? any specific example?

nathanmarz · on Oct 13, 2011

Dynamo, Riak, Cassandra

VladRussian · on Oct 13, 2011

so this is more complicated than your schema ? :

http://wiki.basho.com/Replication.html#Read-Repair

"Read repair occurs when a successful read occurs — that is, the quorum was met — but not all replicas from which the object was requested agreed on the value. There are two possibilities here for the errant nodes:

1. The node responded with a “not found” for the object, meaning it doesn’t have a copy.

2. The node responded with a vector clock that is an ancestor of the vector clock of the successful read.

When this situation occurs, Riak will force the errant nodes to update their object values based on the value of the successful read."

nathanmarz · on Oct 13, 2011

Yes. First of all, not every algorithm is amenable to read-repair. Imagine, for example, storing a unique count in the database. There's no way to know how to combine divergent values in that case. (If the root value is 4, and you have two divergent values of 5, you have no idea if the increment was due to the same element or not. The right answer is either 5 or 6, but you have no idea).

More importantly, if you make a mistake, you corrupt the database. The system I described based on immutable data is human fault-tolerant, which is a critical property. If you mess up, you can always correct things.

VladRussian · on Oct 13, 2011

>Imagine, for example, storing a unique count in the database. There's no way to know how to combine divergent values in that case. (If the root value is 4, and you have two divergent values of 5, you have no idea if the increment was due to the same element or not. The right answer is either 5 or 6, but you have no idea).

if 2 nodes are allowed to accept writes for the same "cell" independently without synchronization, ie. node A : 4->5, node B : 4->5->6 how your schema would work in this case? (of course any schema would work fine if only one node allowed to master the "cell" )

sritchie · on Oct 13, 2011

I think the point here is that nodes don't accept these random writes; any error that's introduced into a system with this structure is fixed on recompute.

ww520 · on Oct 13, 2011

You use the timestamp to resolve the two values to see whether 5 or 6 is the latest.

Your system would have the same problem to resolve which data is the latest.

lmm · on Oct 14, 2011

In his system you neither partition would have written "5" or "6". Rather, the one on the left would have written a "+1", and the one on the right would have written a "+1", and you can tell whether these are the same "+1" or not. You only combine them when you do the query.

ww520 · on Oct 14, 2011

One is +1 (to 5) and the other one is +2 (to 6). Which one is the correct one?