Eventually Consistent: How to Make a Mobile-First Distributed System

bitwiseand · on Jan 19, 2017

I liked the article but I feel the CAP theorem is again misquoted; it often leads to misunderstanding. From the article - "but they would still confront the reality of the CAP theorem: your system can be consistent, available, or partition-tolerant, and you can only pick two"

The CAP theorem states that in the event of a network-partition you have to choose one of C or A. More intuitively, any delay between nodes can be modeled as a temporary network partition and in that event you have but two choices either wait to return the latest data at a peer node (C) or return the last available data at a peer node (A).

Edit: Switched C and A

bigfish24 · on Jan 19, 2017

Agree that the CAP theorem isn't as strict as that statement implies. The trade-off only occurs at certain points. Sorry it was a little unclear!

bitwiseand · on Jan 19, 2017

No need to be sorry. I liked the rest of the article. Thanks for that.

bitwiseand · on Jan 19, 2017

Edit: I also wrote my comment because the article is beginner friendly. A lot of beginners will get the right idea about CAP once they go through comments and then back again to your article.

bandwitch · on Jan 19, 2017

Couldn't agree more. This kind of misunderstanding is also described in Brewer's paper "CAP twelve years later: How the "rules" have changed" (http://ieeexplore.ieee.org/document/6133253/).

_dark_matter_ · on Jan 19, 2017

This is a fantastic article that every engineer who touches a distributed system should read. The simplistic CAP understanding hurts systems by forcing the choice before it needs to be made; i.e. we can (sometimes) have our cake and eat it too.

devty · on Jan 19, 2017

Good point. I often heard that it makes no sense to NOT deal with network partitions and CAP as an "illusion of choice"[1]

[1] http://bravenewgeek.com/cap-and-the-illusion-of-choice/

toomim · on Jan 20, 2017

What a wonderful explanation! You've cleared up hours of confusion I've experienced in my life, with just 2 sentences!

I'm motherf*ing amazzzzzed!!!!!!!

bitwiseand · on Jan 20, 2017

Thanks :)

bsaul · on Jan 19, 2017

After having built an app that was supposed to work offline, and resync when the connexion is up, using OT as well ( A basic one), i have the feeling that the general problem is hard, but if you stick to trying to solve your specific problem, things become much simpler. As an example OT operations can be either generic such as "update a property of an object" and then good luck solving conflicts, or very qualified with a business context such as "transfert money from account a to b", in which case the server-side code with which you synchronize will be much more able to resolve any issue case per case.

astigsen · on Jan 19, 2017

(Alexander from Realm here) Yes, where this approach fits is very dependent on the use case. In our experience the key is to have a data model that is flexible enough to express many different use cases. There might be cases where updating a single field is enough, or you might need it to work as a counter, or maybe your data will be better modeled by ordered insertions into a list.

Having the flexibility to use the exact combination of data types that fits your specific requirements and allows you to express your intent in a way that makes it clear how you want the conflicts resolved is essential to make it work.

You can find a bit more info about our approach to conflict resolution here: https://realm.io/docs/realm-object-server/#conflict-resoluti...

btown · on Jan 19, 2017

That design doc seems to mix loose ideas with the current state of the system. For instance, are strings last-write-wins currently, or do they have character-level OT? Congrats on the release - I'm a huge fan of realtime, and I'd like nothing more than to build apps that depend on such a system - but seeing documentation with those kinds of holes gives a really bad first impression.

bigfish24 · on Jan 19, 2017

Sorry the docs are not clear. Right now you can only apply set operations on the entire string. Internally in the database core, we support substring operations, but we haven't exposed them yet mainly because we need to add the compliment to listen for these changes. We have a working collaborative text editor demo/prototype so hope to have more to say soon!

bsaul · on Jan 19, 2017

Thanks for the link. when you mention "time" in the doc, what's the definition of "most recent" ? Do you mean, last one received by the server ? Or do you have any hint based on local state (either local time clock, or local logical clock) ? That would make a huge difference if, let's say, i reconnect an old unsynchronized device with pending, outdated, changes.

bigfish24 · on Jan 20, 2017

Time means logical clock. We incorporate a number of mechanisms to track causality, such that for each changeset we know exactly which other changesets it was based on. Timestamps are only used when two changesets that are causally unrelated conflict, in which case we use the timestamp to decide on the ordering.

gjjrfcbugxbhf · on Jan 20, 2017

How do you know whether timestamps are synchronized if network is partitioned?

simonask · on Jan 20, 2017

We don't, and we don't need to.

Timestamps are only used to merge conflicting but causally unrelated changes. In principle we could pick a random number instead of using the timestamp and we would still achieve convergence, but it just so happens that the current local time on the device is highly correlated with the user's experience of real time, so that if the user makes conflicting changes on two offline devices, those changes will still be properly ordered in the general case.

gjjrfcbugxbhf · on Jan 20, 2017

OK good luck with that. I guess you keep a graph of all the changes so that the accidental overwrites that will happen+ can be stepped back?

+ for example a surprising number of computers have their local time changed to avoid license terms of poorly enforced proprietary software.

brmunk · on Jan 19, 2017

You are absolutely right - the general problem is super hard. But even solving a specific use-case is hard for most developers who haven't done it before. It just takes a ton of time, and you often get some details wrong. And when you think you nailed it, then new requirements turns up :-) The result is that many apps don't work very well offline. That's why we have spent several man-years on making a general solution with sufficient flexibility to cover most use-cases. We want to enable all developers to be able to add offline first and real-time features without the hassle of reinventing the wheel every time.

espadrine · on Jan 19, 2017

Another reason that OT is hard is that it doesn't fall out of a generic proof; you have to prove that your particular instance of OT follows a mathematical equation. And there is no tooling to verify that (it can't be ensured by typical type checkers).

So people either rely on an OT implementation that is generic enough for their needs, or risk inconsistency or subtle bugs to chase after for years.

Other approaches such as CRDT or total order are not vulnerable to that problem, but they have other concerns (like read performance, garbage collection, or intent preservation).

simonask · on Jan 20, 2017

This is an interesting point. We at Realm have actually spent significant time thinking about it, and it is absolutely true that there is no easy way to prove the correctness of a system using OT.

But it helps to understand the fundamental constraints - for example, that operations must be commutative. In our case, we have had the luxury of designing our own database system, which means we could pick the semantics that we knew would lend themselves well to operational transformation. We have also made an effort to reuse semantics at multiple levels, limiting the number of OT instances that we had to convince ourselves would work.

Then of course, and for me at least, formal arguments are not enough, which is why we have spent a remarkable amount of resources on testing the system, including guided fuzz testing. :-)

bsaul · on Jan 20, 2017

Some advice : if you want to have people trust and thus use this feature, you should make this kind of talk :

https://youtu.be/4fFDFbi3toc

tlarkworthy · on Jan 19, 2017

In Firebase we aim for causal consistency, which is between eventual and strong consistency. Remote clients obviously don't see an the same global ordering of events, they see their local mutations before peers (0 latency local writes). However, they do see remote mutations in the same order they occurred. So you never see "foo left the room" "foo said x", the remote mutations are always in the correct temporal ordering.

ex3ndr · on Jan 19, 2017

How do you solve a problem that when client is too long offline queue of events became too long. For example user can change 100s times it's name (just an example) and you will have to deliver all changes or combine them on server side that can be much harder as you will need to read too many events at once.

tlarkworthy · on Jan 20, 2017

Write coalescing doesn't work with our security model, so we don't. The client has to replay all the writes in disk cache when it comes online. The writes are pipelined fairly efficiently, but I acknowledge that for some workloads it might not be ideal. But most apps do not do require rapid streams of writes while offline.

smnplk · on Jan 20, 2017

I found two great explanations of cap theorem on youtube.

https://www.youtube.com/watch?v=gLtO0vY_M78

https://www.youtube.com/watch?v=Jw1iFr4v58M

And now I hope my understanding is correct:

The way I see it, the most important thing to understand first is what the word partition means in the context of distributed systems. If the system is partitioned (being in partitioned state), that just means that some nodes can not talk to each other. There is an outage of some sort and when building distributed systems, you always have to expect that, so the P (partition-tolerant) from CAP is always there, you always have to choose it. After that, you just have to decide if you will be more C (consistent) or more A (available). If you pick C, then this means some nodes will have to wait until the system is back in connected state to get the latest state, but if you pick A, then you always get the answer, the last one the node had before it got disconnected.

simonask · on Jan 20, 2017

That's correct. I also believe you could in theory forgo partition tolerance, it just wouldn't be a very useful system. Maybe there are use cases out there for it.

smnplk · on Jan 20, 2017

Yeah, but I think the point of distributed systems is to be partition tolerant, the system that you are describing is probably not a distributed system by definition. Or is it, i don't know, I have a plan to read a book on DS this year.

From wiki, it looks like there are 3 main pillars to DS [1]:

- concurrency of components

- lack of global clock

- independent failure of components

So you need the failure tolerance part.

[1] https://en.wikipedia.org/wiki/Distributed_computing

mjewkes · on Jan 19, 2017

The C# documentation is quite clear: https://realm.io/docs/xamarin/latest/

  var puppies = realm.All<Dog>().Where(d => d.Age < 2);

The LINQ integration is appreciated.

  // Update and persist objects with a thread-safe transaction
  realm.Write(() => 
  {
      realm.Add(new Dog { Name = "Rex", Age = 1 });
  });

  // Queries are updated in realtime
  puppies.Count(); // => 1

I'm assuming that "updated in realtime" means "realm.Write is a blocking operation", correct?

brmunk · on Jan 19, 2017

Absolutely right. Only one writer at a time (locally), but writers don't block readers of which you can have multiple due to the MVCC design.

erichocean · on Jan 19, 2017

We shipped an app last year with Realm and are in the process of migrating off of it—it just wasn't reliable enough. We regularly saw crashes deep in their library with highly concurrent usage. Extremely frustrating.

Adding something complex like OT on top of Realm's existing foundation would make me even less likely to use it in the future.

That said, it's nice to see them tackling the problem and I wish them the best. Obviously, bugs can be fixed given enough time and effort and in principle, I like their product.

brmunk · on Jan 19, 2017

Really sorry to hear you had a bad experience! I assume you created an issue for it? If not I would love to hear more about it (bm at realm dot io), so we can get it fixed and hopefully get you back as a happy camper again at some point.

Obviously one of the main points of building the new sync solution (or any library really) is to give you much more time for building differentiating features rather than building basics yourself. Now granted there are other solutions for the pure local persistence, but not many - if any - that will do what the new Realm Mobile Platform will for solving sync.

But obviously the basics must be right and solid - so hope you will provide us with sufficient info so we can get any of your issues resolved.

bigfish24 · on Jan 19, 2017

We really value this honest and direct feedback. Thank you for telling your experience. Obviously, we would love to change your mind and we will continue to improve the database and server product to hopefully earn that in the future.

jonathan_mace · on Jan 19, 2017

Similar work from the research world: Diamond: Automating Data Management and Storage for Wide-Area, Reactive Applications

https://www.usenix.org/conference/osdi16/technical-sessions/...

rjdlee · on Jan 19, 2017

Does anyone know how this compares to Parse (I'm new to app development)?

brmunk · on Jan 20, 2017

Being from Realm, I might not be the right one to give you a balanced view, but I can at least try a bit:

One huge difference is that Parse is no longer developed by a company after Facebook abandoned it. It's been open sourced and its future is up to the community.

Another is very related to this article. Realm resolves all your conflicts automatically, the developer doesn't have to do that. That's even done more fine grained at the property level.

Realms focus on being an offline first database means that your data is always available locally.

Then there are subjective differences. Until the launch of the Realm Mobile Platform, Realm has not had a solution for syncing to a backend. We have a lot of users who have chosen to use Realm locally and Parse as the backend. I guess that's a testament to the local database features from Realm. But with the new syncing solution, that's no longer needed, and all the networking code can be forgotten.

I'm sure others can come up with pros for Parse, but if you are not already invested in Parse, I think most would recommend looking elsewhere due to the first point.

tezza · on Jan 19, 2017

the text is very clear and easy to follow, however maybe a few concrete worked examples could help

bigfish24 · on Jan 19, 2017

Thanks for the feedback! Agree it can be a little hard to visualize with just the text alone. We were planning on follow up posts, so we will make sure to include diagrams.