So it looks like the clients talk to each other through AWS.
Would it be possible to use GDrive/Syncthing/Dropbox as the IPC mechanism for CRDT (e.g. store changes in a bunch of small files in a hidden folder which gets synced to other clients)?
I understand syncing would probably not be fast enough for a live cursor, but the benefit is you wouldn't have to worry about networking or accounts.
Providing Go has a decent enough client/interface into the tools you listed above, absolutely!!
The original "remote" I implemented was a single S3 bucket (no websockets, just a static file location), and if you set the synchronisation intervals to something in the realms of 10ths-1s of a second, you had near enough real time sync/collaboration!
I created a (really poorly named) abstraction called a WalFile which is basically an interface to implement these remote targets. Theoretically, you could build one for any service out there. If there was demand for one, I'd totally give it a go!
Sods law - I temporarily disabled support for the direct S3 remote yesterday because it's slightly broken. But it's a minimal amount of work away from having it functioning again.
The sync mechanism itself is completely agnostic. The interface I'm describing is more in regards to accessing it. Only requirement is that you can access a file and run a bunch of basic operations (list, get, put etc)
Nice project! How was your experience using CRDTs? I often see the criticism thrown around that the history might become enormous even with regular usage. I'm curious what you have to say!
Thanks! Really good question, particularly in this context. It's a problem I'm actively working on.
For context: the underlying CRDT is basically a WAL housing all the events originating from user input. The app itself operates "local-first", so as events occur, it collects/flushes partial WALs which are separately aggregated and merged into full ones, and it's these which are shipped around to the various "remotes". These will (by definition) grow indefinitely.
In order to try and alleviate some of this, I've written a "compaction" algorithm which retrospectively traverses the WAL and attempts to condense some of the events from many into one. For example, if there are 10 update events on the same item, omit the previous 9 and store only the most recent. It's pretty basic but it's had quite a significant impact thus far. Beyond that, I'm going to have to keep thinking and iterating!
The things you get for "free" when building on top of CRDTs are just brilliant, though. For example - I originally built the mergeable WALs purely because it was more convenient for me when syncing notes between my home and work laptops. Turns out that that gave me the collaborative working stuff for free! The real time collab is pretty much a websocket which accepts arbitrary encoded strings and echoes them back to different users. The app itself decides whether or not it's useful and merges it accordingly. I thought that was pretty cool.
One thing that’s unclear to me: once all replicas are in sync up to a certain event (which should be trivially determinable here), can’t you go even further and discard all past events and replace it with the resulting state snapshot? Or maybe that’s what you mean with compacting 10 to 1 above? Obviously at the cost of the undo history :)
Yes! Perhaps not in the way that _some_ might interpret it, though; given it's append-only nature, undo events need to generate a new event, so in practice, you end up with a temporary pointer which traverses back and forth over the WAL, and new events being generated at the head. Admittedly, I actually use a separate legacy log for undo/redo at the moment, with a lifespan as long as a single client session. Room for improvement there (but it does the trick for now).
> can’t you go even further and discard all past events and replace it with the resulting state snapshot
Good question. Tis a tricky one to solve. If we were to maintain the WAL as a single source of truth, it will theoretically grow indefinitely due to the aggregation of delete-events (e.g. we _always_ need to know if a unique item has been deleted). As it stands, the compaction operates on all history, meaning we lose the richness of the history (whilst maintaining idempotency), but for benefit of reduced space and I/O requirements. If there was a need for point-in-time recovery or historical undo/redo, I could totally set a watermark level, e.g. "only compact up to N days ago".
Does it support character level collaboration (like Google docs) or is this just generic entry/line level sync? IIRC CRDTs do not work well for Google-docs type text editing because they cannot capture intent well.
Currently it's at an entry/line level like you suggest, although theoretically I could flip (for lack of a better term) the algorithm and apply it laterally on a per character basis.
The data structure is a WAL storing all mutation events, each of which has a reference to another uniquely identifiable item (by some UUID:timestamp combo). Rather than treating each line as a uniquely identifiable object, I could treat each character as such.
Downside of the current implementation is that two people operating offline on the same line will see unpredictable results when they're able to sync again (basically it will just honour the most recent update event). The per-char change above would remedy that, I think!
I hope that there are existing implementations of OpSets that perform well. I find the solution laid out in that paper to be really elegant and beautiful.
Ah, I understand. Fascinating stuff - thanks for sharing. Food for thought when considering how to evolve the app (e.g. moving beyond just plain text).
Minor complaint - the confirmation code email has an empty `text/plain` part which seems odd for a single line of plain text saying "Your confirmation code is XXXXXX".
Also the box to enter your code has no border which means I assumed I needed to hit the button first to get a box to type into (Dear world, please stop doing this - input boxes have borders! Otherwise how the hell do we know it's there?)
Currently I'm just using the hosted AWS Cognito offering straight out the box (just as a temporary measure to enable people to sign up). I hope to have this looking/operating a bit better down the line.
Would it be possible to use GDrive/Syncthing/Dropbox as the IPC mechanism for CRDT (e.g. store changes in a bunch of small files in a hidden folder which gets synced to other clients)?
I understand syncing would probably not be fast enough for a live cursor, but the benefit is you wouldn't have to worry about networking or accounts.
Just an idea I've had in my head for a while.