Hacker News new | past | comments | ask | show | jobs | submit login

I can't see the talk at the moment but here is my 5 cents on the subject:

The main problem of eventual consistency is not how often that happens, it is: What damage will it do WHEN it happens?

Imagine you're a bank, you handle big clients you lose track of a write of 50+ million dollars. Where did the money go ? How to differentiate that from a fraud attempt ?

If you have customers how will you tell them that you just lose, probabilistically speaking "one in 10 million packages ?"

But that's a very interesting question that also has philosophical repercussions: How come that we are in a society that did not build system that accept a certain degree of failure?




Eventual consistency isn't about "losing writes". It's about how long it will take for all of your replicas to agree on/observe the last written versions and, in the meantime, you'll read potentially stale data.

Certain data structures inherently tolerate staleness or message reordering: look at your Twitter feed, any kind of log, or other "commutative data structures". If you can't handle staleness, you should probably use stronger consistency.

However, if you can find out about staleness after the fact (an asynchronous callback, for instance), you can run some sort of compensatory routine (e.g., overdraft charges for your bank). Then you have an optimization problem: (cost of compensation)*(number of times you have to run compensation) vs. the benefit you get from weak consistency (latency, availability, whatever).

There's an awesome paper by Pat Helland about the problem you mention regarding building real-world systems on top of inherently unreliable components. It's called "Building on Quicksand": http://arxiv.org/pdf/0909.1788.pdf


hmm I have been downvoted but I deserve it. So to rephrase what I do the problem is a you say not losing data that is going to be inserted but performing operations with the wrong type of data which mean errors.

Let's say I have 3 pieces of data required by a function to compute the outcome of a certain operation (withdraw 10 billion dollars): A B C. We change the data in that fashion: A -> A' B -> B' C -> C' -> C"

When I query, because of eventual consistency the f(A,B,C) may very well be:

f(A,B,C), f(A,B',C), f(A,B,C'), f(A,B,C"), ... so on. It is simple when you have 3 sources, but when you have 50, and then when the operation use 50 or those f, depending on 50 other pieces of data ?

Anyway, again sorry for my poor explanation of the issue!


I think you are misrepresenting eventual consistency with losing data. They are not the same at all. Good eventually consistent systems give you the option to see the entire history of a key, and if there are inconsistencies you can resolve those how you see fit. If you absolutely for a fact need to know the value of something you can do reads at high R values to make sure everyone agrees on the value. No one who markets are sells eventual consistency, be it Amazon or Basho in this case drop data in cases where consistency isn't guaranteed.


That is true, although one of the things EC has to be careful about is when the mutation history of a key results in different values if it gets out of order.

Consider the example value :

  Mutations "insert 5 A"                AAAAA
            "delete 2 characters"       AAA
            "insert 3 B"                BBBAAA
vs an out of order version:

  Mutations "insert 5 A"                AAAAA
            "insert 3 B"                BBBAAAAA
            "delete 2 characters"       BAAAAA
If you partition and rejoin, even knowing all the time stamps can make it hard to re-assemble.

But the basic thesis that EC is not appropriate for all data models is certainly valid. I certainly wouldn't want my bank to use such a model for reconciling transactions, they screw up enough as it is.


Right; you need some notion of a total order or commutativity in your update functions. f(A,B) = f(B,A). "Last writer wins" is one example of commutativity, but that isn't often what you want.

What you want is something like a "Commutative Replicated Data Type" [1], where you define a commutative function specific to your application. Libraries like StateBox allow you to build CRDTs [2]. In fact, your example of document editing was one of the areas where these ideas first came up.

There's also a theorem saying that if your program is "logically monotonic"--that is, if your data only "grows" in size, and facts never change from True to False or vice versa--then your program will work under eventual consistency without modification [3].

Finally, bank accounts have to employ eventual consistency. Banks demand availability from their ATMs and give up consistency to handle partition tolerance: your ATM will still work even if you sever its network connection. However, banks, unlike a lot of software, have well-defined compensation protocols for handling inconsistency. For example, they'll charge you for negative balances left in your account via overdraft fees.

[1] http://arxiv.org/pdf/0710.1784.pdf

[2] http://labs.mochimedia.com/archive/2011/05/08/statebox/

[3] http://databeta.wordpress.com/2010/10/28/the-calm-conjecture...


> your ATM will still work even if you sever its network connection

Do they?


It's all about tradeoffs and acceptable risk. A bank, an e-mail provider, and a game analytics tracking system all have different levels of risk tolerance. (Protip: delivery services already lose way more than one in 10 million packages, for any number of reasons.)

I haven't watched the talk yet either, but actually quantifying the risks associated with eventual consistency is a nice idea. It should help system designers make better decisions as to what tools are trustworthy enough for their application.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: