From my limited exploration and talk with Justin over at http://www.kinematicsoup.com, the requirement for low latency combined with a lot of large binary data requires different approaches.
Indeed, the space is growing quickly. I run a company that provides a backend for real-time collaboration, so we necessarily stay on top of what's out there. There's a reason the server-side tech tends be paid: it is extraordinarily difficult to provide guaranteed eventual consistency of data at low latency. The CKEditor guys (and us, for that matter) have put YEARS of development effort into their solutions.
We're working on both offline mode and a generic rich-text data model with support for the major editors.
If anyone's interested, I'd happily do some Q and A here.
I appreciate that you mentioned a competitor without shamelessly plugging your own product, but out of curiosity, and if you don't mind sharing, what editor do you work on?
You can currently already do good-enough-for-most DOM-level synchronization using any editor, of which we have a few examples. A lot of the examples of RTC out there (apps using TogetherJS for instance) are doing this already. For rich text, we'd like to be able to provide 100% support for a particular editor's capabilities, which necessitates a deeper, custom data model.
One of our value propositions is a unified backend for dealing with any JSON data (and eventually beyond), avoiding binding server-side code to a particular choice of UI component.
I am curious about this argument - I think CKEditor 5 made a similar one in this epic blog post about how they implemented real-time collaboration (https://ckeditor.com/blog/Lessons-learned-from-creating-a-ri...). We're using Quill.js with ShareDB, which supports JSON structures (which is great, because for us we often have documents with several rich text fields, and other complex structures). So far we've been able to do anything we wanted with Quill, and I've never felt limited by the data structures we have available... (We also do all kinds of other stuff with ShareDB JSON).
I guess one reason you could need custom types would be to ensure consistency - if two keys depend on each other, and one user sets one key, and the other user sets another key, and the document is now invalid, you'd need the engine to be able to reconcile at a higher level?
> (...) if two keys depend on each other, and one user sets one key, and the other user sets another key, and the document is now invalid, you'd need the engine to be able to reconcile at a higher level?
I am not sure if I understand you correctly here, but it's not really that. Could you give me a more concrete example?
The kind of problems for extra types are, for example: user A changes a paragraph to a list item and user B splits it. As a result you'd like to have two list items instead of a list item followed by a paragraph. This is impossible if you don't give more semantic meaning to the operations.
There are other problems though, as you mentioned - with invalid document. For example, you have this kind of a list:
* Foo
__* Bar
__* Baz
User A outdents "Bar" and user B indents "Baz" creating a list like this:
* Foo
* Bar
____* Baz
In CKE5 this is an incorrect list (we don't allow indent differences bigger than one). This cannot be fixed through OT so we fix it in post-fixers which are fired after all the changes are applied.
These content-preservation edge cases weren't possible to solve with what was available (at least at the time we started the project).
Even apart of that, ottypes/json0 was lacking some basic things, like moving objects. I see they came up with a new implementation recently (https://github.com/ottypes/json1) and it allows moving objects. Maybe the new implementation would solve some problems. However, it is in "preview" state, and the last update was 2 months ago, so I am not sure how well it will be maintained.
Also, there are some edge cases when transforming ranges (which CKE5 use to represent, for example, comments on text or content created in track changes mode). I don't want to bury you in difficult to understand examples but if you are interested you might want to check the examples listed in inline codes for this function: https://github.com/ckeditor/ckeditor5-engine/blob/master/src....
As far as Quill.js is concerned, it is based on the linear data model, which brings limitations when it comes to complex features. Transformation algorithms for linear data models are much simpler and there are more implementations and articles in this area. Everything depends on your needs. If Quill.js features set and functionality fit your needs then the solution you chose is correct.
With CKE5 however, we didn't want to go on any compromises. We needed complex structures for our features, and for having a powerful framework - to enable other developers to write whatever feature they want and have those features working in real-time collaboration. We wanted transformation algorithms which will handle all the edge cases. It is true, some of those cases are quite rare. And the old "10/90" mantra applies here, in this case "10% of use cases brings 90% of complexity". But those edge cases happen and we didn't want to disappoint our users.
I think the argument is more about the historical data structures that were used in rich text. A lot of editors either used the DOM, or a very flattened data structure like Google Wave, Quill.js, DraftJS etc. With these flattened data structures it becomes harder to represent complex rich text with things like tables, nested blocks, etc. If you have a nice JSON data structure that is collaborative you can do a lot, and in many / most use cases it is sufficient. However, you can run into use cases where the collaborative data structure will ensure consistency across clients, but violate some semantic constraint on the data.
For example, imagine you have an application that has a list that must contain at least one element. Assume there are two elements in the list. A Shared JSON data structure on its own (that allows for immediate local edits) would to allow two clients to simultaneously delete one element each. The end result is that the client app on both sides will become aware that the constraint was violated only when the remote operation comes in. Resolving this becomes difficult. What is the resolution strategy? Which of the two clients should initiate it? This is a contrived example for sure. But you run into things like this in various use cases, and occasionally you need either new data structures that encode these semantics, or you need an extendable system that allows you to customize constraints, and resolutions.
Could you point us to some of those togetherJS apps? I just recently learned about it in the context of collaborative browsing. How is this related to collaborative text stuff?
Are you working only with editors running inside web browsers or also native ones? What I would like to have is something that bridges two native editors, let's say emacs and VS Code, and let's a team share the editor (whatever each developer's preferred editor is) and not only the screen on Slack or similar. That would be great for remote pair programming. I guess there would be a protocol for selecting the buffer/tab and opening files (with local approval to prevent remote file reads/writes).
> What I would like to have is something that bridges two native editors
The big failing of co-editors is that currently they all silo into a single editor. THat's OK for e.g co-editing stuff on a web platform (e.g Google Docs) but it totally sucks for collaborating on code/documents you edit locally, forcing everyone to use the same editor/IDE, which ultimately fails because no one will use the same editor/IDE across a whole company/team/project, so although useful I've never seen one take hold for anything but a very short amount of time in any company.
We desperately need (a couple of) well-defined protocols that get implemented across editors, and one to emerge as a widely supported winner, whatever its shortcomings.
For plain text, this is not so hard. The data model for plain text (e.g. a string) and the set of mutations on that model are pretty small. Also describing where a cursor is and what is selected is likewise fairly straight forward. Co-editing between plain text editors is completely doable, IMHO.
It's much harder for rich text editors (RTEs) because the various RTE's vary widely in the exact subset of rich text features they support. One will support tables, and another will not. One will support video and another will not. One will use a linear position to describe where the cursor is, and another will use a DOM Range. This makes it very hard to support co-editing between different rich text editors. It's not impossible, just hard enough where most of us are not tackling it.
I think 'Google Wave Operational Transformation' was an attempt at this. I believe there is some merit in this; though my doubts come from the failure of Google Wave.
There has been a proliferation of realtime collaboration within code editors because the underlying data model (a string!) is so simple. As you mentioned, you would need to define a sufficient protocol encapsulating the common functionality between code editors along with defining which aspects would be shared and broadcasted vs kept private (e.g. an independent vs "follow" mode).
The data model could easily be defined in Convergence.
Implementation-wise, the primary limitation would be that at the moment we only offer a javascript/typescript client. I haven't explored the VSCode base but I know it was written in TS and provided as a "native app". We do have an example of Convergence running in node.js (NWJS).
Do you guys rely on a central authority approach like Prosemirror does? I'm desperately looking for something that's somewhat stateless or at least replicable (even better if shardable) on a document basis.
We do have a server in the architecture, and there is a small amount of state maintained by the server, per document as it is being edited. However the system is clusterable and documents shard across the cluster.
Thanks for the component you provided for shared cursors in text areas (https://github.com/convergencelabs/html-text-collab-ext). I've been using it with ShareDB, and have been working on adapting it to work with single text inputs, which has been surprisingly difficult (mainly dealing with scroll/overflow behaviour).
Our pleasure! If you want to share what you are doing we might be able to incorporate it into the utility for you! Happy to support a pull request, or to work collaboratively on it.
Convergence Labs | Mid-level Frontend React Developer | Salt Lake City, UT or REMOTE | Contract | $7-9K / month
Convergence Labs is the creator of Convergence, the world's first API designed from the ground up for collaborative co-editing. We do consulting work as well and recently landed an extremely interesting and ambitious project that is a perfect fit for our product. You would be our first hire and work directly with the founders.
This would be an excellent position for a smart and ambitious FE developer with a couple years of React experience. You'd be working directly with three industry veterans on a potentially transformative collaborative consumer-facing application. We are small and flexible, so just about everything is negotiable: 1099/W2, rate/salary, working from home, hours, equity, whatever.
Even as the author, I think web-based IDEs have pretty limited utility, but it is a great showcase for the power of our API. We (my co-founder and I) built this in about 10 days.
The fact that that article needs 10 plus pages to describe one of the simplest use cases belies the unfortunate underlying truth: OT (and realtime co-editing) is intractably difficult for all but the most constrained data models. Which is a pity because it is undeniably cool, and why we built Convergence (https://convergencelabs.com) -- general-purpose realtime collaboration.
Yup, I just wrote an article [1] about just this. Simultaneous co-editing doesn't work unless the UX includes the appropriate cues to avoid this sort of thing. There are astoundingly few good examples of good (much less great!) real-time coediting UXes. This is the primary reason why we added first-class support for these cues so that it's not so damn hard!
Agreed. My first UX attempt was to simply put a list of the current editors on a document in the upper-right corner, and highlight the field or paragraph being edited by each person. It worked well enough, but this was also a limited scale audience - just a few dozen legal document authors. A larger scale app might require more, but it met the need we had.
Could you elaborate on the kind of bad UX you've encountered, and what solutions can be used to fix them? I can't think of anything that a simple cursor position indicator (and maybe a selection indicator - though that could get messy if the selection is large) doesn't solve.
Very cool to see this on the front page. I would agree that for the use case of plain text, yes, the problem has been solved, but for just about anything more complicated it quickly becomes intractable. Rich Text, for example, is extremely difficult to get right (ask the ckeditor guys!)
For those wanting real-time collaboration functionality in their apps but don't have the intellectual curiosity (or time!) to learn the ins and outs, we [1] built a general-purpose API for folks to add real-time collaboration to their web apps. Think Firebase but designed from the ground up for simultaneous editing and with additional first-class support for common UX needs such as shared cursors and selections. We agree that the web is moving in this direction and are excited to see what gets built!
Thank you for the technical details. Probably a bit late for you guys, but we offer a BaaS tailor-made for this sort of product. For instance you could add multiple mouse tracking and selection awareness in a matter of hours with our APIs. Check out our diagram demo for a taste: https://convergencelabs.com/demos/
Congratulations on what looks like a very impressive piece of software. Do you guys have any plans to add collaborative features to Construct? Where, say, students in a remote classroom could work on the same game together?
After working from a home office for five years, moving into a coworking space last fall was a game-changer. I'm not the kind of person that gets excited about much, but I couldn't help but wax poetic about it to every person I talked to for a solid month. The work-life separation, perks of being downtown, and network effects of being around other entrepreneurs has hugely amped up my productivity and opened doors for my businesses. I've heard that the quality of these spaces varies widely, but as a developer all I need is a permanent desk to leave my monitor/keyboard/mouse, for the third of the price of an office.
Isn't the third of the price of an office still quite a substantial amount of money? Can I ask what kind of space you get? I always imagined a small desk or cubicle for each person (individual renting). I ran my own business for awhile but jumped out because it was just so expensive and it was very difficult to separate life and work with a family.
It's $225 a month for a reserved desk which is essentially a less closed-off cubicle with a lockable file cabinet. This was comparable to other coworking locations in my city (Salt Lake).
Please let me know too. I wanted to set one up in Palatka FL. I found a building. Had a possible budget of 150k. Sadly, the building owner wanted 350k for a termite infested building. Wouldn't budge.
I would have offered broadband, AC, small kitchen and choice of standing desk or traditional tables (all home made with plywood of course) for about $20 a week. Then offered 24/7 access for around $30.
One of the biggest perks of working from home for me is not having to commute to work. Having to get up and drive downtown every morning sounds awful, so I'm surprised you listed it as a perk.
Do you live in a place with good public transport or do you actually not mind having to drive downtown every day?
I live in a bikeable medium-sized city (Salt Lake), so it's a ten-minute bike ride or 20-minute bus ride for me. I really like the atmosphere of being downtown, especially all the food options.
Exactly the same here. I actually only go to co-working spaces when I have very little work to do but the urge to be among people. When I need to be productive, I stay home.