Show HN: Operational transform for realtime collaborative editing in JS/Flow

tmail21 · on Dec 30, 2016

The basic problem with OT (and current "real-time" collaborative editing approaches) is that they can only achieve eventual consistency.

While this sounds great, eventual consistency DOES NOT mean semantic consistency. This rules it out for many applications where semantic correctness is important.

Even for simple text documents you can get eventually correct but semantically incorrect results.

For example, consider the sentence

"The car run well"

This has an obvious grammatical error.

Now imagine two collaborative editors.

Editor 1: Fixes this to

"The car runs well"

Editor 2: Fixes this to

"The car will run well"

Depending on the specific ordering of character inserts and deletes this could easily converge to

"The car will runs well"

Obviously this statement is both grammatically incorrect as well as semantically ambiguous. (However, both editors see the same result and it is hence eventually consistent). Worse, OT collaborative editing will silently do this and carry on.

Now, for non-critical text where errors like this are ok, this may not be a big problem. But imagine contracts or white papers, or trying to use this on something like a spreadsheet where semantic correctness is critical and one can see why the current scope of collaborative "real-time" editing is very limited.

In general current "real-time" editing approaches like OT are outright dangerous.

theaustinseven · on Dec 31, 2016

I would expect that if an editor makes a change, their change should be preserved. There is no general case where you can decide which edit to keep, because in some cases(like the one you presented) people are editing the exact same sentence, but far more often people will not make edits to the same small part at the same time(at least in the real world). This makes OT very practical since generally the eventual consistency can be reached quickly, and there is consistency, so the results with given inputs are predictable.

tmail21 · on Dec 31, 2016

It is fairly trivial to construct an example where editors are editing _different_ sentences and OT takes two locally semantically correct states and converges to a semantically incorrect (but grammatically correct) state.

I think OT and other "real-time" collaborative editors are practical if you are willing to (or your use case can) live with "silent semantic errors".

The greater the document "interconnectivity" (eg, paragraph A is semantically related to paragraph C), the greater the likelihood of having far-flung silent semantic errors.

For documents like spreadsheets this is very obvious because you start getting nonsensical results and (hopefully) errors very quickly. For Word-like documents, the errors are "silent" and thus much more insidious.

My point was that that is an element of OT which many users don't realize.

With regards to predictability, I would not call the results of OT predictable from a user's perspective. It is predictable in the narrow sense that for a sequence of arrival of operations AT THE SERVER it is predictable.

However, it is impossible for a user to predict how their local operations will interleave at the server with other users' local operations. For all practical purposes the converged result is unpredictable from the user's perspective.

The only property which one can confidently assert with OT is eventual consistency.

theaustinseven · on Dec 31, 2016

Yeah, I guess I see what you are trying to say. I just want to clarify when I say predictable, I mean that given a set of operations. No matter the order they come in, the results will be the same. This makes OT powerful in that everyone just needs the operations eventually in order to have a consistent document. The only middle ground that I could see that would allow predictability in the document, and help mitigate these silent errors would be to notify users of when they have both edited the same range before consistency was reached. This would catch "almost" any case that I think you are talking about, although would of course miss the situations in which semantic errors arise due to errors in very different parts of the document(e.g. referencing a figure 2.1, while someone changes that figure to 2.2), but these errors can still easily arise with a single editor, and so are not really unique to OT. I do think that it would be nice to have a solution to that problem though...

devdoomari · on Dec 31, 2016

I can't think of a CRDT / anything that can handle your example...

but wouldn't this be somewhat (*not 100%) mitigated with UI? (e.g. showing carret positions + time-agos of different users, asking user(s) to resolve a conflict, etc.)

tmail21 · on Dec 31, 2016

Yes, this problem could be mitigated somewhat by showing caret positions. But this method is very reliant on the users paying extremely close attention. It does not work at all when the silent semantic errors are caused by "far flung" edits, where you don't even see the other caret because it is "off screen" for you.

Asking users to resolve a conflict is not possible, because the whole idea of OT is to have no-conflict merges and would have no idea where the conflicts are.

willyk · on Dec 30, 2016

Thanks for the thoughtful post. What do you see as the next best alternative approach to OT?

tmail21 · on Dec 31, 2016

If one absolutely wants real-time collaborative editing then the only (long-term) solution I see is something like a deep learning solution that continuously semantically analyzes the merged state for semantic errors. In a particular problem domain this might be 5-10 years out. In the general case, this starts to approach the level of difficulty of AGI and hence who knows when that'll happen :)

Some practical solutions are that the document starts out in a 'real-time collaborative editing' phase. After this phase is over, the document moves to a 'review' phase where the document is reviewed for semantic errors and those errors are fixed using a 'non-real-time' approach.

The only way I see at this time to avoid silent semantic errors in the first place are non-real-time approaches.

The best practices here are optimistic locking/leasing of "semantically-connected regions" (could be defined as a paragraph, document, multi-docset, worksheet, slide etc.) along with semantically useful diffs (diffs that are meaningful for an end user) for conflicts.

You could say that this is the approach taken by version control systems like git, where the semantically-connected region is the File/Document.

Semantically useful diffs for anything other than text documents is a non-trivial problem in itself. But is still more tractable than avoiding or detecting silent semantic errors with OT.

juliendorra · on Dec 30, 2016

Great! I like the format of the post a lot: demo, code example and super clear explanation all combined in a good starting point on the subject.

(You could also have cited Etherpad as a common implementation in addition to Docs. Etherpad was a direct predecessor to character by character OT in Docs —the team was acquhired— and it is still widely used by many organizations. But then there is so many examples and libs, I understand that you wanted to just give context!)

Leftium · on Dec 31, 2016

Wow! Front-end only is actually a great advantage for me. I actually tried to modify [ShareDB] to be front-end only[1]. (ShareDB uses WebSockets or any other full duplex stream; it's a great reference if you want to implement true client/server.)

I guess my use case is quite unique: [Todo.taskpaper] needs to sync multiple "views" of a single document in the same web app. Right now it uses very naive syncing; I'm going to try to upgrade it to blue-ot.js.

[ShareDB]: https://github.com/share/sharedb

[1]: http://stackoverflow.com/q/40616650/117030

[Todo.taskpaper]: https://todo-taskpaper.leftium.com

theaustinseven · on Dec 30, 2016

This is cool! I've been working on something similar in Go, but I haven't spent much time on it recently. Is this a purely front-end application, or is there a server associated with it?

c0da · on Dec 30, 2016

Thanks! My implementation is currently front-end only.

Here's the code that simulates all the client/server communication: https://github.com/cricklet/blue-ot.js/blob/master/js/ot/orc...

It shouldn't be too hard to take that and put it in an actual client/server architecture. The client needs to have a way to send local operations to the server (this can just be an endpoint on the server) and the server needs a way to broadcast operations to all clients (probably webRTC?).

grizzles · on Dec 30, 2016

Hey Kenrick, I agree - WebRTC would be a good choice since it allows out of order packets, that can speed stuff up quite a bit.

Did you see this article? It was posted on HN yesterday. It links to a gh project that might be useful for your project. https://getkey.eu/blog/5862b0cf/webrtc:-the-future-of-web-ga...

akanet · on Dec 31, 2016

I'm going to try to implement this in my app. I use Firebase/Firepad for true p2p OT but it's become cumbersome. Do you have additional thoughts about communication channel implementation?

mandeepj · on Dec 30, 2016

> all clients or to all interested clients.

> (probably webRTC?). It can also be done with SignalR.