Weird that there is such detailed technical information alongside this statement:
> And anyway, purists might argue that the "content" of a directory doesn't change when the files it points to change; the content is merely a list of filenames and inode numbers, after all, and those stay the same, no matter what happens inside those inodes. Purists make me sad.
Or maybe Unix makes the author sad? You can’t wish the OS into doing something different with directory files when the set of contained files has not changed (but a file was modified) so that these Make troubles could be handled differently.
The author mentions that they also wrote a backup program (bup), and for backup programs it would be very convenient if directory mtimes would get updated like this (recursively up to the root), as it would allow to skip scanning the entire filesystem for changed files (which in my experience is where backup programs spend most of their time).
I don't remember if mtime is updated on each write call or just on fopen but I could see this being a huge performance overhead, namely for applications that are FS bound. I wonder if io_uring would help the situation though since it's mainly geared toward filesystem operations.
No reason a hypothetical recursive mtime needs to be atomic. The kernel could just stick it in a buffer somewhere and deal with that sort of think out-of-band and in batches. You'd probably need some filesystem journaling trickery if you want to make sure the recursive mtime always updates eventually when a file is modified.
> Annoyingly, when you update the content of a file, the mtime of its containing directory is not changed. All sorts of very convenient tree traversals would be possible if the directory mtime were updated (recursively to the root) when contained files changed, but no. This is probably because of hardlinks:
He lost me here. As much as that's a would be cool kinda feature it smacks of a special kind of tunnel vision: what's the point of anything but what you happened to be working on?
Consider how many writes happen on your system. Consider what sliver of a fraction of them even come under mtime scrutiny. The idea is to amplify those writes all the way up the directory hierarchy? For every write? And contend on ever write to /home/?
We already needed a noatime, if it wasn't for make I could imagine a nomtime too.
The original title of the letter, as submitted to CACM, was "A Case Against the Goto Statement", but CACM editor Niklaus Wirth changed the title to "Goto Statement Considered Harmful". Regarding this new title, Donald Knuth quipped that "Dr. Goto cheerfully complained that he was always being eliminated."
Reading that same link, Wirth didn’t invent the cliché, either.
> If your name is Dijkstra and it’s 1968 feel free to use the phrase “considered harmful”
The phrase "GOTO considered harmful" was actually used by Niklaus Wirth, not Edsger Dijkstra. Dijkstra had submitted his letter with the title "A Case Against the Goto Statement"; and Wirth, as editor, changed it.
But also, it was already a journalistic cliché outside of computer science before Wirth applied it to Dijkstra's letter.
"Short and concise" isn't much praise. If I wanted something short and concise, I could say "is bad", which is probably both more concise and more honest. "Mtime comparison is bad" is more clear and more concise.
"Considered harmful" is misleading because the passive voice suggests some kind of general consensus which usually doesn't exist.
We're currently dealing with crashes across all Qt applications using QML on NixOS [1], since Qt utilizes the binary's mtime to invalidate the cache of embedded QML resources.
Since all builds have an mtime of 0 as timestamps are the biggest source of reproducibility issues [2], QML loads outdated cache objects which will then load invalid bytecode at runtime and therefore causing a crash.
Our initial plan was to utilize a hash of the binary [3] which should be IMHO the most straightforward, unlikely to break and future-proof way to solve the problem. The currently suggested implementation's performance could most likely be improved by generating the hash once on startup instead of every time when a QML resource is loaded, but I believe since it's already cached in memory, hashing is not that expensive, binary sizes are usually relatively small and resource loading (of embedded ones) doesn't happen that often, it should be already reasonably fast (real performance test haven't been done yet).
The big challenge will be upstreaming it, once we've proven it to properly work. The current approach has been apparently rejected [4] and was declared to be a downstream issue, which I personally don't agree with, since sooner or later reproducible builds will become the norm and therefore will affect everyone.
Current ideas to work around it require individual solutions per distribution/ISV, as this would mean they'd have to come up with domain specific criteria for cache invalidation (as the store path/derivation hash on NixOS) and to maintain a downstream patch for this solution and furthermore wouldn't work for local build processes (e.g. from within an IDE).
Lesson of the day: never use mtimes, they'll bite you in the ass sooner or later!
I really do think it's unfortunate that the OS cannot keep something like a "write count" for each file. That would be enough. Possibly it would even be good enough to know the number of times a file had been opened for writing.
There's the i_version field in the inode structure which is pretty close to that. Though I'm not sure it's accessible to user space, or whether it's used only by the kernel NFS server.
> Random side note: on MacOS, the kernel does know all the filenames of a hardlink, because hardlinks are secretly implemented as fancy symlink-like data structures. You normally don't see any symptoms of this except that hardlinks are suspiciously slow on MacOS. But in exchange for the slowness, the kernel actually can look up all filenames of a hardlink if it wants. I think this has something to do with Aliases and finding .app files even if they move around, or something.
This reads weird to me.
As the author points out, a directory entry (hardlink) is just a filename pointing to an inode. When you first create a file, and there is only one filename for that file, that entry is a hardlink. So in that text the first occurrence of "hardlink" should probably read "the kernel does know all the filenames of an inode".
But then it gets weird. When the author says "hardlinks are suspiciously slow on MacOS", it sounds like they're saying "accessing files is suspiciously slow on MacOS" - because accessing any file in the filesystem goes through hardlinks. All links to a file, including the first, are hardlinks.
I suppose it's perfectly allowed for MacOS to simulate Unix hard link semantics (ie. names are just pointers to inodes, with each name being equal) even though the underlying filesystem doesn't have the name+inode split. It seems indeed this is the case on HFS+: https://developer.apple.com/library/archive/technotes/tn/tn1...
It would be neat if filesystems like btrfs or ZFS could expose their internal checksum of the file data to userspace, to quickly see if two files are having identical content (eg. think of rsync). (Assuming the hashes computed internally the filesystem are actually purely based on the file data, and not for example metadata like block pointers. But, dedup-capable filesystems should surely have such checksum internally...).
The problem with that is for good performance reasons that checksum is typically on smaller blocks than the whole file. For example ZFS splits files into a maximum of “recordsize” (128K) by default and checksums each block.
Not so likely whatever remote target you are looking at has the same blocks and checksum algorithm.
On the upside you can often ask such file systems to take two snapshots and what changed between them. Or to export some kind of differential to transform the original snapshot to the newer snapshot. Both ZFS and Btrfs can do those.
For ZFS, we have the pool-wide transaction number, which increments with every disk write. Couldn't you use that as the "time" for the last modification to each file so long as you have a starting transaction number to compare it against?
That talk is about the "relatime" mount option, not the O_NOATIME open() flag. The O_NOATIME open() flag was added following the semantics described in the glibc manual (https://www.gnu.org/software/libc/manual/html_node/Operating... ); I don't know where these semantics came from, my guess (given that it is glibc and O_NOATIME is described as a GNU extension) is that these semantics came from Hurd, though I haven't checked. These semantics aren't ideal (you need extra code to retry without the flag if it fails with a permission error, or you have to open() without the flag and add it later with fcntl(), both cases requiring to an extra system call), if it were added today it probably would have simpler semantics.
I suspected this post was generated bot spam given the name of the convention and how most discussions about feminism go on the Internet but I have to say that the talk was interesting and I am glad I watched it.
> And anyway, purists might argue that the "content" of a directory doesn't change when the files it points to change; the content is merely a list of filenames and inode numbers, after all, and those stay the same, no matter what happens inside those inodes. Purists make me sad.
Or maybe Unix makes the author sad? You can’t wish the OS into doing something different with directory files when the set of contained files has not changed (but a file was modified) so that these Make troubles could be handled differently.
Am I missing something?