This is awesome. Especially the fact that it's built-in and easy to turn on.
Seams like quite a complex solution though. I guess some big company (Microsoft?) implemented it internally for their own use and later tried to move it to upstream git. I wonder if there was some pushback from git maintainers from having this functionality built-in.
Git for Windows currently supports Vista and above - the AF_UNIX support was only added in Windows 10 1803.
Named pipes are fine, the semantics are basically identical, and you can guarantee there is a separate namespace versus the filesystem (for AF_UNIX, \x00 prefixes work on Linux but not on macOS).
> [FSMonitor] is currently available on macOS and Windows.
Are there any other git features with this limitation? Wild to me that we're here.
Thankfully the article covers the semi-longstanding "hooks" that existing (& very high performance) tools like Watchman (which are cross platform) can use.
Great in depth read. Good stuff! From the 2.37 release[1].
I've been wondering about why there was no linux support, and found an e-mail from the author of the subcommand (as well as the github.blog post) explaining the situation.
Apparently an older implementation using inotify was dropped because inotify does not work recursively, so you would have to do an inotify call for all directories of the hierarchy which is obviously very inefficient. There are system wide limits in the number of directories you can listen to, and even if you increase the limit you would probably cause a lot of overhead.
Newer linuxes support the fanotify system call, which does allow recursive listening. They haven't implemented something using fanotify yet however.
Thanks, I didn't know about fanotify. Now I'm wondering why everything I use day-to-day (file syncing tools, IDEs, etc.) still seems to be stuck with inotify if there is a better option on modern operating systems. Maybe some of them are actually using fanotify under the hood, despite components still having "inotify" in their name.
In the original fanotify API,
only a limited set of events was supported. In particular, there
was no support for create, delete, and move events. The support
for those events was added in Linux 5.1. (See inotify(7) for
details of an API that did notify those events pre Linux 5.1.)
My guess is that inotify is so slow with large directories that it wasn't worth it. Plus inotify has cumbersome user limits.
inotify has a number of other relevant limitations, like not being able to create recursive notifications or handle "move" operations. Implementation effort is going to be way higher for an inotify-based system, and of course that's made far worse by the numerous file systems in linux - I imagine any implementation would probably start first with ext4.
I suspect an ideal solution would be via ebpf, but I'm not sure.
My assumption was that on Linux it's just been using inotify or something for a while and so hasn't needed a bespoke monitor. I have no idea if that's true or makes sense though.
Not sure why the response got downvoted. I personally found Git performance to be, well, okay on macOS (but depends) and absolutely horrible on Windows due to very slow stat() calls on NTFS.
Of course, in a large enough monorepo Linux performance would also suffer, but to a much lesser degree.
Also, conveniently, both Windows and macOS have an API for recursive directory watch, whereas Linux doesn't (in Vanilla kernel). Inotify can only watch the immediate directory you're observing + there's a pretty low default limit on the number of inotify descriptors that you're allowed to have on top of that
We've found this to be basically true. Git operations that stat() a lot are dramatically, catastrophically slower on OS X, and that's part of why my employer started doing fs watching there and mostly left Linux as is.
More than once I've had to update a cross-platform tool to avoid stat()ing because though cheap on Linux, it took 10s of seconds on OS X.
Having a cross-platform file watcher built into a ubiquitous tool like git is pretty awesome. I could see build tools integrating with this and making more aspects of development faster without having to run a bunch of file watcher services. They all seem to have issues.
I have tried Watchman, but setting it up is a pain. There are so many ways to use it. I also welcome running less Facebook code on my systems.
TBF, Git operations on repos with many small files are extraordinarily slow on Windows (probably not Git's fault, because all file operations involving many small files are slow on Windows, even copying stuff around on the desktop), so that feature is much more critical to have on Windows than Linux.
Considering there is only really three platforms I think it's a pretty fair assumption.. and we're talking about git here, you can assume cross platform includes Linux.
I've stopped using anything language specific like Guard or nodemon. Instead I use the inotify commands on Linux and entr/ack on macOS so no matter what I'm doing, I can watch for changes in a directory.
I’ve got qualms with just about all of the big tech companies in one way or another.
My 2c is that one of the unambiguously positive externalities of the tech mega-corp trend is all the great OSS we get as a by-product of their operations.
I mean, I don’t exactly love how iPhones get made, but I’m pretty stoked that clang kicks ass now.
Related question: most of the companies I know that have large monorepos have a sizable dedicated team to support their dev tools, and have invested a lot to make monorepos feasible.
Are there any recommendations or standardized tools for structuring monorepos for companies that don't have a dedicated dev tools team? Last I checked, lerna seemed to be the most common tool to support JS monorepos - is that still true? Is there a better tool for a primarily Typescript codebase (primarily React on the front end, Node on the backend, but also native mobile apps)?
We recently used Turborepo -- https://turborepo.org/ -- for a project that started as a two Electron-based builds that quickly escalated to a three Electron-based build. Once we had it set up for the two, it was quite easy to add one more. Our shared components were in a central package while custom ones existed in their respective app directories.
The nice thing with this type of separation for us is being able to target CI/CD scripts to specific apps. Previously we were using targeted dev script logic to initiate the different app builds which just wasn't maintainable. This new approach this time around made for Electron deployments to be super simple, consistent, and repeatable.
All this was done by two team members on dev side.
In JS/TS world, until recently yarn was a de facto tool for monorepos. Recently pnpm looks like getting traction. There are a few other new tools recently getting popular as well, it's a hot topic.
npm itself has also upstreamed more "workspace" support for monorepos. It's not necessarily the best tool for the job, but it's a possible tool.
Also incremental build support in Typescript itself has seen a lot of improvements in recent versions. It is useful to check if your monorepo can benefit from Typescript incremental builds.
A bit tangential but you might also be interested in reading about the USN Journal on Windows which has been around since Windows 2000 https://en.m.wikipedia.org/wiki/USN_Journal
My holy grail implementation would be a "partial clone" that downloads desired files like normal, but creates stubs for selected files that are not stored on the device but downloaded on-demand upon opening them, like the OneDrive Files On-Demand [1] or Google Drive File Stream.
Interesting! It seems some of Scalar from late 2021 has already made it into the official git project's contrib dir [0]. It looks like Scalar is mostly an opinionated way to configure git [1] using git partial-clone among other features.
Git partial-clone looks almost perfect, except it only downloads and displays files explicitly added to the git sparse-checkout list. I want some "magic" vfs shenanigans that lets me view and browse the full repo exactly as if the full repo where checked out, but when I open a directory or file the contents are downloaded on-demand.
It seems win32 has specific support for this kind of tech at a couple different layers, one that's low level filesystem virtualization that's like FUSE I guess [1], and another that's higher level and exposes sync status of files via explorer and other win32 apis [2] which is what I assume OneDrive, Dropbox, Google Drive, etc use.
I have a healthy suspicion of the performance of file-watchers. I hope this feature doesn’t make Git faster at the expense of “all filesystem operations crawl”.
This has been the way to get performance on a large Git repo for over a decade now, just not built into Git. It provides very good improvements even in environments that aren't the fastest, like laptops.
We deployed a system like this across hundreds of engineers in a reasonably sized monorepo and had zero complaints about system performance. While I don’t know the underlying architecture of inotify/etc, it seems to be very efficiently implemented.
It is of course not turned on by default. I don't know how bad the performance hit is, but it's an option so you get to choose the tradeoff. Either your git operations are slow or you take the small hit on all operations. If you spend all of your time working in a big repo it's probably going to be worth it.
I assume it is using one of the native platform APIs to detect file changes, which generally have some sort of overhead associated with them and then may or may not block on userspace code that can be badly behaved.
In my experience on Windows watching for file events, I’ve seen that it’s not very reliable. As the article notes, it’s possible that the operating system may drop events. Nevertheless, this solution should help improve performance and reduce disk scans. If you have any other applications dependent on watching for file system events, enabling this may hinder those (again, this is based on my experience with Windows).
I'm using ReadDirectoryChangesW() to read a filtered stream of events from the USN journal. I've not noticed any reliability problems. Technically, the kernel API can always drop events, but whether that is a kernel can't keep up problem or the daemon application not servicing the event stream fast enough doesn't really matter. The API does know if/when events were dropped. And the FSMonitor daemon guards against that and forces a "resync", so the "git status" client is advised to do a normal scan, so the output is correct.
My main complaint with the Windows implementation is that it does not not play well with the lock-by-default policy of the NTFS filesystem. I deactivated the the filesystem watcher on Windows after the agent repeatedly locking files so that checkouts would fail.
You might give the new fsmonitor a try. It does not lock any files on the disk. It does have a single handle to the worktree root directory to listen for events. But even that is not exclusive. And it CWD's out of the tree during initialization, so it does not prevent you from deleting the worktree while it is running.
I wonder if this will cause issues in repos where changes can come from containerized apps syncing their runtime config to disk. Depending on the platform and the container framework, a lot of different things could potentially break here, from NFS-related to number of open files.
Enabling this messed up something related to projectile/helm-projectile which I use to navigate to files, and is an integral part of my git/magit setup in Emacs.
the buffer projectile-files-errors says "warning: Empty last update token."
As with a lot of developer tools, the most adopted solutions are rarely the best tool for the job. But because everyone knows them, thats what continues to be used.
Moving to a continuous, asynchronous strategy versus a point-in-time synchronous strategy, seems like a perfectly reasonable way to improve performance.
All file operations involving many small files are slow on Windows, that's hardly git's fault. It can just do its best to work around the problem (for instance with this file watcher thingy).
I don't think that's so easy. For SVN we also saw a >10x performance difference between checking out the same repository on Linux vs Windows (however, after initial checkout, performance scales mostly with the number of changes, not the repository size).
Seams like quite a complex solution though. I guess some big company (Microsoft?) implemented it internally for their own use and later tried to move it to upstream git. I wonder if there was some pushback from git maintainers from having this functionality built-in.
Also why for Windows they use named pipes when in theory Windows also supports it? (https://devblogs.microsoft.com/commandline/af_unix-comes-to-...)
BTW, to the author of this article. It is very good. It was an interesting read. The are some small issues:
- "markdown" link didn't get converted to html: "[core.untrackedcache](https://git-scm.com/docs/git-config#Documentation/git-config...)"
- the link to "philosophy" of Scalar doesn't work: https://github.com/microsoft/git/blob/HEAD/contrib/scalar/do...