Don't Settle for Eventual Consistency

rdtsc · on April 23, 2014

Don't settle ... unless you can.

Some business processes/problems can handle eventual consistency. Banking is the classic (and perhaps un-expected) example. You can overdraw your account, there are processes to go back and solve conflicts. Some are automatic, some are manual some are legal.

Bitcoin at the blockchain level is eventually consistent, just wait 10 minutes or whatever the current time it (I don't use it personally just know about it) and then you can be fairly sure of the validity of the transaction. There are built-in incentive to assure histories will converge. But, wallets kept at some exchange should _not_ be eventually consistent. Should _not_ be able to take 100x more than your wallet holds and send it to someone else. There is no regulatory, automatic of any other kind of framework to revert transactions that went through.

There is crdt (a commutative replicated data type) research. So these are data types that an always solve inconsistencies should they arise and instead of diverging they auto-converge, in face of conflicts. Think set union operation or max() function.

These kind of trade-off will percolate up through your data layer into your business problem. For some cases you'd want to pick one, for some pick another.

exabrial · on April 23, 2014

Yah but banking sucks as an eventual consistency problem. Sure there are processes to go back and solve conflicts, but they suck too.

Part of the reason we can't transfer money instantly between two accounts in the USA is due to sorting out eventual consistency. Federal guidelines on transfer intentionally make it a slow process so the manual processes can catch up.

Terr_ · on April 23, 2014

> Part of the reason we can't transfer money instantly between two accounts in the USA is due to sorting out eventual consistency.

Yeah, but that's because everything is a case of eventual consistency. Causality itself is limited to the speed of light, and "instantly" is impossible. The only question is whether you want to block/wait, or gloss over it with eventual-tricks.

Just look at online FPS games! Even with some of the best consumer-grade communication links and high expectations for each node, it's impossible to provide actual "instant" behavior, and all modern games contain huge reams of code dedicated to maintaining an eventually-consistent environment.

smoyer · on April 23, 2014

I came here to say exactly this ... there are many problem domains where you can work just fine knowing that your data will be eventually consistent.

AlisdairO · on April 23, 2014

Very true - the issue is perhaps that you need to be absolutely sure of your problem domain, and absolutely confident in your engineers' ability to maintain eventual consistency (as opposed to simply inconsistency) in the face of changing requirements. More-consistent systems allow much more freedom to change the way your system works without worrying as much about the ways in which it might impact your data storage.

You don't have to go for full serializability - you can very often get away with something simpler like consistent writes and potentially out of date reads. That sort of system scales a long way unless you're very write heavy.

sargun · on April 23, 2014

So, I've actually been thinking a lot about this problemset. I think that polyglot databases are the future. Most applications can deal with eventual consistency, or at least bounded stale reads [see: most interactions with Twitter, Facebook, etc..].

I think a hybrid consistency model will end up becoming the way we end up going, but not without some changes to the working definition of eventual consistency.

I think that eventually consistent systems that we'll see in the future will at a minimum have atomic, consistent (as in ACID), durable transactions. The big thing you're missing is isolation, and by that, you're also missing serializability. (http://www.vldb.org/pvldb/vol7/p181-bailis.pdf)

Bounded staleness consistency will probably make most people happy -- as in "Give me a consistent snapshot of the database as of 1 second ago," or "Give me a consistent snapshot of the database as of this logical time" (assuming the database consumer has some sort of logical clock you're passing back and forth -- In essence, MVCC with some exposure of the timestamp. (better explanations: http://pages.cs.wisc.edu/~cs739-1/papers/consistencybaseball...)

In addition to this, part of the problems that were outlined in the paper talked about multi-datacenter issues -- a lot of issues with banking-like apps can be solved with escrowing For example, each datacenter has 10% of the account's balance, and you can use consistent commits in the datacenter. If an interactions effects more than 10% of the account's value, it should occur across datacenters, and make a consistent commit across datacenters. (More info: http://mdcc.cs.berkeley.edu/)

Anyways...the future is bright, but we need someone to throw a ridiculous amount of money, and time at it.

rdtsc · on April 23, 2014

Basho's Riak DB is becoming such a hybrid. It is an traditionally an eventually consistent database, but they are adding strong consistency in the new release. [I don't work and am not affiliated with them, just follow them on Github].

And by hybrid in this case I mean some part to keyspace can be tagged as consistent while others remain as before.

sargun · on April 23, 2014

Yep. I've done a lot of work with Basho's Riak already, but there is still a lot of work to be done.

jude- · on April 23, 2014

A more recent version of the COPS system is described here: http://www.cs.princeton.edu/~wlloyd/papers/eiger-nsdi13.pdf

Github for the more recent system: https://github.com/wlloyd/eiger

You can find the first author's posters and talks on the subject of scalable casaully-consistent storage here: http://www.cs.princeton.edu/~wlloyd/research.html

toolslive · on April 23, 2014

Really, you don't want to drop problems like consistency, transactional reasoning, data loss, ... on the head of an application developer. They should be able to have a simple, idealized view on a database or key value store; otherwise you risk their sanity.

vidarh · on April 23, 2014

I'd argue that if you don't make application developers consider these things, then you risk their sanity when they have to deal with the fallout.

They need to understand the tradeoffs, and be able to decide which guarantees are important when.

kstrauser · on April 23, 2014

Exactly this. I wrote a blog (at https://www.crittercism.com/blog/scaling-with-eventual-consi...) about why my company designs for eventually consistent stores from the beginning. There are a lot of use cases - and the number is growing as companies deploy large, scaleable systems - where a consistent store just isn't practical or even needed.

You can't just use an eventually consistent store as an ACID system and expect it to work, and it's almost always a Really Bad Idea to try to implement ACID on top of EC. Understand how your database works and design to its strengths instead of trying to pretend that its weaknesses don't exist.

Terr_ · on April 24, 2014

and be able to decide which guarantees are important when.

This is why I like the idea of Aggregate Roots from domain-driven-design. The boundary is clear in your object model.

nutjob2 · on April 23, 2014

I think the point is that you sometimes can't deliver it, so you give them a slightly less general or idealized abstraction to work with.

The "head in the sand" algorithm isn't always reliable.

Nacraile · on April 23, 2014

Am I missing something, or is this just a rehash of lamport timestamps, which were first described in 1978, and are part of any reasonable undergrad distributed systems course?

notacoward · on April 23, 2014

It's a pretty good rehash, but yeah, it's a rehash of Distributed Systems 101. In particular, the idea that changes can be propagated eventually but still in order is almost as old as computers, but don't say it too loud. For some reason that seems to bring out the knives in the dark.

terranstyler · on April 23, 2014

Very well written and informative once you distinguish "causal" from "casual".

david927 · on April 23, 2014

I think this is an eloquent response to this: http://www.businessinsider.com/the-future-of-the-blockchain-...

tomp · on April 23, 2014

Another take at this problem is HyperDex [1], which provides strong consistency [2] and multi-key transactions [3].

[1] http://hyperdex.org/

[2] http://hyperdex.org/papers/hyperdex.pdf

[3] http://hyperdex.org/papers/warp.pdf

exabrial · on April 23, 2014

Eventual consistency violates the principle of least surprise.

"Oh you hit the save button? Yah, we'll get to that."

Nacraile · on April 23, 2014

Eventually consistent systems aren't built because eventual consistency is considered a "good" property. Eventually consistent systems are built because eventual consistency is the least-bad option in a fundamental trade-off.

When a distributed system is partitioned, it is impossible to execute all operations on a shared resource serializably. So, the system must either fail some requests, or fail to be serializable.

You can have your principle of least surprise, or you can have a system that is capable of serving traffic even after some drunk sailor drags an anchor through a submarine cable. You can't have both.

Terr_ · on April 23, 2014

That's not what it means. The save is understood and complete, the unknown part is whether someone else's revision will take priority or whether it'll cycle to some other state.

pjc50 · on April 24, 2014

Hitting the save button is fine; the problem is more when one person hits "save" and the other person hits "cancel" at the same time. There can be only one winner.

mad44 · on April 23, 2014

Here is a summary of their SOSP 11 paper. http://muratbuffalo.blogspot.com/2012/09/dont-settle-for-eve...

notacoward · on April 23, 2014

The title is misleading link-bait (eventual does not necessarily imply unordered) but otherwise looks like a pretty good discussion of issues many of us have to deal with.

toong · on April 23, 2014

The page is timing out. Or is that the 'eventual consistency' pun intended ?

roeme · on April 23, 2014

Not timing out here, Pun would suck, and here's a cached version for you: http://webcache.googleusercontent.com/search?q=cache:KNB7FI3...