Hacker News new | past | comments | ask | show | jobs | submit login
Fast file synchronization and network forwarding for remote development (github.com/mutagen-io)
179 points by saikatsg on Oct 16, 2022 | hide | past | favorite | 63 comments



Mutagen author here — happy to answer any questions about Mutagen[0], its Docker Desktop extension[1], its Compose integration[2], or anything else!

[0]: https://mutagen.io/ [1]: https://mutagen.io/documentation/docker-desktop-extension [2]: https://mutagen.io/documentation/orchestration/compose


Can you comment on the extent to which it will be maintained? I understand that you can’t predict the future, but the ever-looming threat of abandonware is always prominent on my mind when I see a tool that looks too good to be true.


I can't predict ultra far into the future (who can these days... :|), but Mutagen has been under active development for about 6 years now[0]. At the moment I have enough funding to work on it full-time until at least the middle of next year, though I also do Mutagen-related contracting and consulting work to support the project. Mutagen's Docker Desktop extension is going to be a freemium product designed to support the project more directly, which will hopefully allow development to continue indefinitely.

[0]: https://github.com/mutagen-io/mutagen/graphs/contributors


Thank you for all your hard work, amazing tool!


Nice work. Depending on your secure networking goals, you could evaluate using the OpenZiti Golang SDK to embed zero trust networking in Mutagen as an alternative to VPNs.


I've been using mutagen for a long time as a workaround for WSL2 slow Windows storage access. I setup 2 copies of my project code, one in Windows and one in WSL2. IDE sees Windows drive while build tools and docker use Linux drive. Mutagen keeps them in sync near instantly via ssh.

Thanks for this great tool.


Can you elaborate on the effectiveness of this methodology? It sounds too good to be true (so I’m tentative to explore). What hang ups if any do you encounter? Did you have any issues working around a corporate security policy? Have you ever run into any issues executing a test too quickly on the windows side for the sync to catch up?


corporate policy problems would be only if mutagen.exe is banned by your IT department. usually open source projects are ok with most companies.

there are issues occasionally with symlinks (especially with large node_modules folders) but most of the times nothing breaks and they are easily fixable - running mutagen sync monitor shows you things as they happen. one thing to consider is where .git directory is hosted. I personally keep git on windows side - you should make sure it only exists on one side and add an exclusion (mutagen has a parameter for that). performance wise, on SSD, an npm install from zero takes at most 5 seconds to sync (running npm on wsl2 -> files appearing on IDE).


I've been using mutagen for over 6 months now to sync over an M1 Linux VM. The only thing I miss is an option that would say "force everything from A" or "force everything from B" I've had rare cases where there were conflicts that I only could resolve by pausing mutagen and running rsync. But I appreciate that mutagen warns you and just doesn't overwrite silently like syncthing can do sometimes.


Mutagen allows to choose a replication mode: https://mutagen.io/documentation/synchronization

Do you want something different from the "one-way-replica" mode?


Yes because I have generated files too that I can edit further, and need to send log files, screenshots etc to other people.


One option I often recommend is setting up multiple synchronization sessions targeting different parts of a particular codebase with different synchronization configurations (kept mutually exclusive via ignore specifications). Quite often people want unidirectional replicas for a certain folder (e.g. a build directory) with more standard bidirectional synchronization for the rest of the code. It can be a little more complex to orchestrate, but you can use Mutagen's project functionality or a shell script to automate the setup a bit.


Yes I have that too, node modules are one way toward my host machine so my IDE can use them and a few things like that like logs


when does syncthing overwrite changes. that should not happen and would be a major bug


I’ve only ever casually browsed information about syncthing. Even I’ve stumbled across stories of people with a naive setup that, once fired up, started erasing all their files.

I understand that this is probably an oversimplification, but I’ve read at least two separate stories that are along those lines. The comments for both stories were basically “yeah that sounds right because of the way you set it up”. It’s not a bug so much as how synchthing works. My understanding is that there is a circumstance where you sync an empty folder with a folder with your documents, and it decides to synch the empty folder such that it erases the documents. Without any kind of warning. Wild.


maybe some people get confused thinking that unidirectional sync (send only) works like a backup?

it is possible to use syncthing for backups, but that is actually one of the pain points. there is an option to specify to ignore deletes, but that option is hidden in advanced settings, and there is no quick way to check if it is turned on. every time i delete something that i want to keep on the backup side. (like pictures from my phone, where i am running out of space) i first go to the destination/backup server and double check that this option is turned on to make sure that i won't accidentally have the backup deleted too.

it's also an issue that i can't see the remote settings from the client side. i have to connect to the remote server to check.

there is also a problem with the way it resolves conflicts. if you have "ignore delete" set up, and then you delete files, syncthing will tell you that the two locations are out of sync. it offers a button to force a sync, but there is no preview explaining what that button will do. there is also no choice in how the out-of-sync status should be resolved (in which direction the sync should happen to get the two locations in sync).

i have always been afraid to ever push that button, and there is no way to make it go away. i think this is an UX bug, because if i intentionally configure to ignore deletes, then deletes should not count as a difference. they should actually be ignored.


When machines were not online at the same time or with unreliable connectivity. I've managed to never loose data because I have staggered backups so I could always revert to the in-between version.


In past I have used lsyncd to develop locally and synchronize the changes to a remote host over ssh where the code base was compiled. This worked nicely even over GPRS network connection with a speed like 30 Kbit/s. As the link had high latency it was important to use emacs shell for the remote connection. This way I could type the command locally and send it to the remote host when pressing enter.


We've been using Mutagen extensively for remote development with an EC2 instance hosting a docker-compose with a couple of services and live rebuild+reload, and it's been working fantastic.

It's also nice for automatically managing port forwards.


Have you considered plugins for vscode, e.g., vscode remote and vs code containers?


That's actually a different use case than ours. We only use it to run the builder and app services (which can use a bunch of resources), while keeping the editor/IDE tooling local.

That said, we have developers using VS Code, JetBrains, as well as Vim. The Mutagen-based workflow works well with all of them.


Mutagen also has a Docker extension. Really easy to set up. I installed it recently after searching for ways to speed up Docker on an Apple M1. It did work in my case.


Super useful tool!

Plus, it's multi platform. I'm using it to synchronize directories between hosts running macOS, OpenBSD and Linux. Everything works fine.

I haven't tried the Docker Desktop extension since I switched to Colima (Docker Desktop is constantly broken on Apple Silicon).


TIL about Colima - thanks! Grabbing it now on my M1 Mac.


This sounds like my dream tool - I've always loved how quickly and well local tools work and remote environments cut into that good experience significantly. For me to be productive, I really need an instant feedback loop where tools work fast and I can immediately experience the result of some small piece of work.

Has anyone tried this for a real-world project and can share feedback?


I generally find systems that aren't setup to let you dev locally and require a dev in prod or remote also don't let you work in tiny tight feedback loops either. I generally focus making it work everywhere the same instead of fast sync but that's just me. Well and the systems I have control over.


I find that VS Code's Remote-* extensions work well. I'm currently writing a Terraform provider on a remote Linux box using Remote-SSH and everything feels local. Compilation, etc happens on the remote and if I were serving requests it's dead easy to forward a port.


I like the remote editing stuff in VSCode, but I often end up having to rsync things at some point since I need to test code on remote servers, but I sometimes need access to other resources only available on my local machine. I can use VSCode for the editing, but I can use mutagen for the syncing. VSCode also ate some of my remote work before so I started using an extension that created periodic backups on the remote so that if it happened again I wouldn't lose much if any work. That hasn't happened in a while, but it was bad enough that I want to ensure it doesn't happen again. I tried an rsync extension, but it was flaky and poorly maintained. With something like git I'd have to create WIP commits, remember to push & pull and then remember to clean up the changes before merging.


Mutagen tries to be secure so in principle one can develop on untrusted remote machine. VSCode remote always assumes that the remote part is trusted.


That sounds interesting but I can't find any mention to it in the docs. In fact, it sounds like it's just copying files over to the remote and running commands there.

Are you able to provide a reference to how Mutagen secures my code on an untrusted remote?


The general philosophy with Mutagen is to (a) delegate encryption to other tools and (b) use secure defaults (especially for permissions).

So, for example, Mutagen doesn't implement any encryption, instead relying on transports like OpenSSH to provide the underlying transport encryption. In the Docker case, Mutagen does rely on the user securing the Docker transport if using TCP, but works to make this clear in the docs, and Mutagen is generally using the Docker Unix Domain Socket transport anyway. When communicating with itself, Mutagen also only uses secure Unix Domain Sockets and Windows Named Pipes.

When it comes to permissions, Mutagen doesn't do a blanket transfer of file ownership and permissions. Ownership defaults to the user under which the mutagen-agent binary is operating and permissions default to 0700/0600. The only permission bits that Mutagen transfers are executability bits, and only to entities with a corresponding read bit set. The idea is that synchronizing files to a remote, multi-user system shouldn't automatically expose your files to everyone on that system. These settings can be tweaked, of course, and in certain cases (specifically the Docker Desktop extension), broader permissions are used by default to emulate the behavior of the existing virtual filesystems that Mutagen is replacing.


So, transport-wise they're the same.

For files at rest on the remote, I guess I assumed files would be encryted on the remote with a local key since GP said "one can develop on untrusted remote machine" and "VSCode remote always assumes that the remote part is trusted".

On an actually untrusted remote, removing group read permissions doesn't do much to secure my code.

The only scenario where it's helpful is a system with multiple non-admin users, perhaps like a university lab computer but who's doing sensitive work on those anyway?


In many ways Mutagen and VSCode's remote extensions are the same idea, with trade-offs in terms of flexibility vs. integration.

Shared systems with multiple non-admin users was one of the original motivating use cases for tighter default permissions.

I don't think there's any scenario where one can perform truly secure development work on an untrusted system. You could certainly store encrypted code in an untrusted location, but there's not much you could do with it on that system (without a hypothetical compiler or tool that maybe supported some sort of homomorphic-encryption compilation operations?). Even decryption on-the-fly for processing by regular tools wouldn't be secure on an untrusted system. And running any code there would be equally insecure.

I'd imagine that for any seriously sensitive work, one would only want to work in highly controlled, trusted, and firewalled environments. If there's a scenario I'm missing though, definitely let me know.


No, I think this clarifies it - thanks!

Just to be clear, when I mention sensitive work, I'm not necessarily talking about national security, military, etc. kind of work. Any work for a client is sensitive enough that I wouldn't do it on any remote I (or my client assuming there is approval) don't control.

I will try Mutagen at some point. The fact that it's editor agnostic is certainly a big sell!


Yes it is excellent, syncing macos (Jetbrains tools and a few other things) with a Linux VM .


I haven't found anything better than using Unison. Maybe the linked README could compare prior art?


Conceptually speaking, Mutagen and Unison are very similar (and actually I mentioned Benjamin Pierce's work in another comment here asking about the sync algorithm - fantastic stuff!). I tend to avoid direct comparisons because they always come across one-sided, but some cursory differences:

- Mutagen tries to integrate recursive filesystem watching very tightly into its synchronization loop to drive synchronization and allow for near-instant filesystem rescans

- Mutagen automatically copies an "agent" binary to remote systems to support synchronization, so no remote install is required

- Mutagen uses Protocol Buffers for its data storage, so synchronization sessions created with older versions continue to work with newer versions

- Mutagen written in Go, Unison in OCaml (which allows Mutagen broader platform support "for free")

- Mutagen tries to treat Windows as a first-class citizen

- Mutagen uses race-free traversal (e.g. openat, fstatat, unlinkat, etc.) to perform operations

Obviously the internal implementations are different, but both use differential (rsync-style) file transfers, both use the same reconciliation concepts, etc.

Mutagen has the advantage of Go, recursive filesystem watching, and modern POSIX/Windows APIs that didn't exist when Unison was originally written, though some of that functionality has been brought into Unison.

For a comparison with Syncthing (and to some extent Unison), check out this comment[0].

[0]: https://news.ycombinator.com/item?id=30966448


What is the benefit over rsync which is the perfect tool for this at the moment? Maybe add an faq section to the readme for questions like this?


The primary benefits:

- Mutagen performs bidirectional synchronization (though it can also operate unidirectionally); rsync is unidirectional

- Mutagen uses recursive filesystem watching to avoid full filesystem rescans (whereas rsync always does a full filesystem rescan). This allows Mutagen to provide a more "real time" sync.

- Mutagen has an active synchronization loop that doesn't require manual invocation.

- Mutagen has more idiomatic Windows support.

- Mutagen doesn't require that it be pre-installed on both endpoints.

Both use differential transfers (i.e. the "rsync algorithm") for transferring individual files.

There are other differences, of course, as well as similarities. Mutagen's design is tuned for development work, rsync's design is tuned for replication. I still use rsync for archival operations on a daily basis - it's great!


This sounds useful. But one question that comes to mind right away:

Does Mutagen handle the case where “local tools” (running on a completely different architecture than the remote) still need to “know” about include/header/library/etc. files from the remote machine in order to provide working “intelligence” capabilities?

It’s one thing to efficiently sync “code”, but it’s another to make local tools fully-aware of the remote system’s header files, libraries, etc.


On the synchronization front, Mutagen's only goal is to facilitate the synchronization of files (albeit with a focus on development-related settings and low-latency for a "real time" feel). It doesn't attempt to integrate with any higher-level tooling (except in the cases of Docker Desktop and Compose, which is facilitated via external projects). That sort of tooling, language, and framework-specific integration is a bit outside the project's target scope (and something that becomes very domain-specific).

Mutagen will, however, happily operate between different operating systems and architectures, so things like working with a remote amd64-based Docker engine from your local arm64-based laptop are totally possible.

Also, several external projects (such as DDEV[0] and Garden[1]) do use Mutagen as a low-level component in their stack to provide synchronization that does "know" a bit more about the framework that you're using.

[0]: https://ddev.com/ [1]: https://garden.io/


Just wanted to say thanks for taking the time to reply, and for the links to those other projects that use Mutagen!


I’d like to know more about the theory behind the synchronisation — how the syncing is known to be safe and non-destructive.


The synchronization uses a repeated three-way merge algorithm, very similar to Git's merge when merging branches. It is triggered by recursive filesystem watching, which is also used to accelerate filesystem rescans. It maintains a virtual most-recent-ancestor and uses the two synchronization endpoints as the "branches" being merged. Much like Git has "-X ours" and "-X theirs" options, Mutagen also has automated conflict resolution[0] modes that can be specified. You can find the reconciliation algorithm here[1] (and there are an exhaustive set of test cases in the corresponding _test.go file).

To avoid a large class of race conditions (at least to the extent possible allowed by POSIX and Windows), Mutagen will use `*at` style system calls for all filesystem traversal on POSIX systems, with a similar strategy on Windows.

Also, to avoid race conditions due to filesystem changes between scan time and change-application time, Mutagen will perform just-in-time checks that filesystem contents haven't changed from what was fed into the reconciliation algorithm.

[0]: https://mutagen.io/documentation/synchronization#modes [1]: https://github.com/mutagen-io/mutagen/blob/master/pkg/synchr...


Also, while Mutagen's exact implementation is novel in a number of ways, I would be remiss to not point out that huge amount of academic work in this field was done by Benjamin Pierce[0] and later implemented in Unison[1].

[0]: https://www.cis.upenn.edu/~bcpierce/papers/index.shtml#Synch... [1]: https://www.cis.upenn.edu/~bcpierce/unison/


I’ve been using unison for what feels like 14 years. Once working it was great but it always took me a while to figure out the exact command line options I wanted. Beautiful tool.


> but it always took me a while to figure out the exact command line options I wanted.

Reminds me of my occasional use of rsync. I'm always afraid of invoking it wrong and synching in the wrong direction.


Thank you so much for the great replies!


Mutagen sounds impressive. I wonder how well it works synchronizing a large directory. My company has two servers, in different states, connected via a VPN. We currently use rsync and cron. The directory contains thousands of files. Would Mutagen be able handle that?


I use mutagen daily on a very large monolithic application repository and while the initial sync can take a couple of minutes, subsequent edits are replicated sub second - even when changing multiple files e.g. via linter auto fixing files


Any user stories with *vim + mutagen for _large_ remote code bases? Vs code remote is the only thing that has been fast enough in my experience, but I would love to be able to use my local neovim instance for remote development instead and this tool looks promising.


It should work fine. Many users use Mutagen on multi-GB codebases. If we're talking something larger (say 10s of GBs or TB-sized monorepos), then there are some tweaks you can do to make life with Mutagen a little easier. Feel free to reach out to jacob[-at-]mutagen.io if you have a specific use case, or pop over to the Mutagen Community Slack Workspace[0] to chat.

[0]: https://mutagen.io/slack


I did this and Mutagen worked great. It was pretty awesome to not have any lag while developing and Mutagen was really fast to detect and sync changes. Never really noticed any delays, everything felt instant


this sounds really impressive.

unison has a UI to manage conflicts on a file level, whereas synchthing only allows a manual overwrite of all changes at once, and i have to log into the remote daemon if i want to overwrite changes there instead of locally.

syncthing also will not tell me if all local changes arrived at the remote side, or if the remote is still busy downloading, without logging into the remote daemon

on the other hand, unison only manages one source/target per process if i remember correctly while syncthing can manage many different source/target pairs conveniently.

how does mutagen compare in these two areas?


There's no conflict resolution UI at the moment (either graphical or command-line based). Mutagen's conflict resolution is primarily performed via the specification of its synchronization mode (which can automate the resolution of most conflicts) and (in the case where conflicts can't be resolved automatically) by manually deleting the file that should "lose" the conflict (because Mutagen will synchronize a modification over a deletion).

Mutagen provides fairly detailed reporting of synchronization status, file staging progress, and change application problems via its "mutagen sync monitor" and "mutagen sync list" commands. These also support JSON output (and Go-template-style formatting), so you can pipe this information into other tooling. If you need REALLY detailed information, you can look at the debug or trace-level logs from the Mutagen daemon, but that's typically only for debugging during development of Mutagen itself.

Mutagen is somewhere in between Unison and Syncthing on the topology front, but closer to Unison. It still only supports two endpoints per synchronization session, but unlike Unison it doesn't require that one is local (i.e. you can do remote-to-remote sync using your local system as a proxy). But with both Mutagen (and Unison), you can set up a hub-and-spoke topology if you want to sync with multiple nodes.


If only macOS supported mounting via SSHFS ...


Is there an Android app for Mutagen?


How does it compare to syncthing?


Here's a bit of a comparison that I wrote the last time Mutagen was posted:

https://news.ycombinator.com/item?id=30966448


Thanks, very interesting read. One-sided installation and low-latency sound particularly appealing.

Two features I didn't see addressed:

- Is there a GUI of any kind? I use Syncthing's web GUI + a tray applet for monitoring syncing on desktop, and on mobile there's an app for syncing also. Makes it easy to detect if there's a problem with any of the folders being synced, and easy to add new folders this way also.

- Is there an equivalent to Syncthing's untrusted devices feature? It basically allows for client-side encrypting the files, so you can have say one remote with the encrypted files and another remote with unencrypted files.


How's that compared to sshfs (wth cache/kernel_cache enabled) ? I've used it few times where I had need to dev like that and it was generally just fine for just editing a file, where performance tanked was doing a lot of file I/O at once (say updating git repo)


Advantage of mutagen is that it works on OSes that can't do sshfs. It felt faster too especially with a lot of IOs like node modules or other things that touch a lot of files. But I never ran a benchmark , it is so much faster by at least a factor 10 than whatever is in docker desktop when populating node modules that I don't even need a benchmark.


The benchmarks will likely be highly dependent on your use case, but SSHFS-style virtual filesystems (specifically those backed by FUSE) typically have significantly lower performance than something like an APFS/ext4/NTFS filesystem that Mutagen could target with synchronization.

All of your readdir()/stat()/open()/read()-style calls will suffer significantly on virtual filesystems, and unfortunately these get hit a lot by things like IDEs (e.g. when indexing code), compilers, and dynamic language runtimes (especially PHP).

No tool is at fault in this chain, of course, it's a hard problem. Mutagen is able to offer better performance by being a little less dynamic and creating "real" copies of all the files on a more persistent filesystem.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: