What many people not fully grasp, at first sight, is that offline support for a ...

lewisjoe · on May 7, 2019

This is where differential synchronization shines (basically git). While OT/CRDT are really good for syncing small edits, they suck at being eventually consistent (with intentions preserved) on very large edits.

We make a powerful collaborative word processor for the browser (Zoho Writer https://writer.zoho.com) with complete offline support. This is one problem we'd like to solve in the long term. One way to solve this is to fallback to differential sync when syncing huge offline edits - and present the changes as tracked-changes (underlined and striked-out) to the user. This way the user has the power to resolve the conflicts, instead of the app messing it up by itself.

I'd like to hear if there are better ways to sync large offline edits, with the intentions preserved.

scofalik · on May 7, 2019

> One way to solve this is to fallback to differential sync when syncing huge offline edits

This is another question -- does automatic syncing make sense for huge differences in users' contents? The answer might be no.

OTOH, isn't this a rare scenario? How perfect it has to be?How much of a compromise can the users accept? These are interesting questions which shows that creating this kind of software isn't easy -- both from technical and UX point of view.

qpiox · on May 7, 2019

First of all when discussing about offline edits we should make effort to draw attention that "offline" could mean different thing to different people.

Offline editor from the point of view of the app's software developer might mean that editor can sometimes survive few minutes of lost connectivity, after which it will reconnect and possibly sync the team work.

Offline from the perspective of a team of scientists writing a joint paper means that some of them will take their work truly offline, to a secluded mountain hut, for an extended period (week or month) and will come back with a complete rewrite of the text to the point of being unrecognizable to the other authors. And then the coauthors will disagree with most of it, and will want to revert some of the pieces and keep some other pieces.

And scientists are not the only people needing joint collaborative environment. And not just for short papers. Few years ago, I was working in a larger team with a group of people doing revisions of a book, and there were so many revisions and revisions of revisions of revisions, and zillions of comments, that we had to split the book not in chapters, since Word got stuck even with 20-page chapters. We had to split the chapters in sub-chapters, of 5-10 pages in separate files, so that the processor would not get stuck with the myriad of comments and revisions.

These are all separate scenarios in collaborative writing, and there could be many other scenarios, so any solution should first explain for what type of scenario is it really targeting. Automated sync of collaborative work is not always the best thing to do.

bluGill · on May 7, 2019

Then there is the science fiction version where the editors are both online, but several light years apart with the resulting lag in updates.

jgtrosh · on May 7, 2019

Well it's quite common for people using “normal” email-exchanged MS office revisions. I have no clue if there's specific tooling for these cases. Of course you could brush them off and say they wouldn't be a problem if they just used online editors.

tshaddox · on May 7, 2019

I work with a tool that provides diffing and merging for rich text (effectively a limited subset of HTML). It doesn’t have or need real-time collaborative functionality at all. I have read all the papers that I can find and understand, but I’m still not happy with our merging algorithm for rich text.

The issue is less about which fancy algorithm/data structure to use and more about even defining in human terms what would be expected in certain merge conflicts. Currently, we err on the side of raising a merge conflict rather than deciding to use one version for a certain change, but in practice most conflicts have a pretty obvious resolution when a human looks at the two conflicting versions.

lewisjoe · on May 7, 2019

> The issue is less about which fancy algorithm/data structure to use and more about even defining in human terms what would be expected in certain merge conflicts

Exactly. When it comes to rich text editing which can have semantic trees (e.g tables and tables within tables), merge conflicts are so tough to handle.

Consider this: one version deleting a column in a table, and the other version splitting the column and adding a new row - by this time it's sort of impossible to find a meaningful representation of the table without manual intervention.

bachmeier · on May 7, 2019

> sync large offline edits, with the intentions preserved

How would that be possible if none of the users can see what the others are doing? If you have three users that start with version 1 and they all make their own version 2, even the users don't know what the "correct" version 2 should look like. Or to put it differently, I don't even know what is meant by "intentions" in this scenario.

scofalik · on May 7, 2019

This is an interesting subject, to be honest.

Of course, it depends on the collaborative editing solution. When it comes to OT -- if the implementation is correct, it doesn't really matter how many operations are queued for synchronising. So, it doesn't really matter how long users were de-synced when creating their content. But this is only theory and it applies more to technical side as there might be some semantical problems (intention preservation).

I believe the same is true for CRDT but I am not sure.

So, on one hand - if the editor is working fine for small batch of changes, it should also work fine for big batch of changes. The reality is often more harsh, though, and full of edge case scenarios.

z3t4 · on May 7, 2019

OT is very simple to implement, but needs a server, although a simple one, that have a counter, and attach the count to each message when sending it to the clients. That way clients can know how to transform the message, and which message came first. CRDT however is more complex, but does not require a server. It can also tolerate some "offline" time. I do not however think CRDT can be used in order to get conflict free merging, I think there always need to be a manual merge resolution. But I would love to be proven wrong! Maybe machine learning and live training could be used to make better merge suggestions. If you solve the merge problem that would also make software version control merging easier! Which is a huge market. And there are probably a ton of other use cases too, besides just collaborate editing.

scofalik · on May 8, 2019

> OT is very simple to implement, but needs a server (...)

Well, OT in general does not need the server but server-less implementations are more complex (more transforming functions to write, except of "inclusion transformation" you also have to write "exclusion transformation" algorithms).

I also wouldn't say that "OT is very simple to implement" - it is in it's base form, for linear data, with the server in the network. But every enhancement brings a lot of complexity.

z3t4 · on May 8, 2019

On a work "test" I was asked to write a diff function (least amount of transformation to get from one state to another state) from scratch, even though I've read many diff-algorithm-papers in my life I couldn't actually implement one (even though I got 99/100 tests right, they failed me with the feedback that I need to work on my algorithms ... ). But after getting the concept of OT it was straight forward to implement it from scratch. That's my relativity, maybe someone else think it's easier to implement a diff algorithm.

If two clients, your client (a) and another client (b), writes the letters a and b respectively at the same time, from either client's perspective they where first, so on client A the state will be "ab" and on client B the state will be "ba". How do you solve that without a server/master ? With a server/master the server just have to increment a counter for each operation, and the clients can use that counter to know which order. So if a has counter 77, and b has counter 76, the state will be "ba".

anthonys · on May 7, 2019

Explains why we're never getting Confluence offline editing.