Ashamed to admit (as an OSX user) that I didn't even realize the FS was case-insensitive (having migrated from years of Linux usage to a non-Linux desktop). It does a good job of hiding this from the user (filenames are still listed with cases, and bash autocompletion completes to the correct case as well)
MacOS by default uses a "case-preserving case-insensitive" filesystem, so you can create files with mixed case, but you can't create two files with the same name and different case. It's one of MacOS's more-egregious crimes against Unix. Fortunately it doesn't manifest that often, but it rears its head often enough to be a problem.
It may be a crime, but is the result of a set of compromises in the design of the OSX filesystem, which had to work with a BSD variant while also being compatible with pre-OSX days. I think it’s one thing they actually did an elegant job with.
> It's one of MacOS's more-egregious crimes against Unix.
Nah. Using a file system means putting up with its semantics. HFS+ was case-insensitive; they were deploying an upgrade to millions of existing filesystems.
If you mount, say, an NFS volume, MacOS does the expected thing.
Case-senitivity is not a "nasty holdover", it is a good design decision that continues to be proven correct (case in point, this bugfix for case-insensitive filesystems).
Why would you introduce complexity into the filesystem to try to normalize file names when you can simply, not? I mean, have you _seen_ the mess that is Unicode normalization? Hundreds of different glyphs or whatever that are all considered equivalent, but are actually composed of different bytes. The filesystem should try to make sense of all that, and consider them equivalent paths?
Even if you say "well, just capitalization, not Unicode normalization," there's the whole German letter ẞ => ss (or is it ß?) and similar friends like the Turkish dotted I that have popped up as articles on HN. Absolutely glad Linux filesystems by and large do not attempt to take that on, and treat paths as a bucket of bytes instead.
All for what benefit - so you can type File.txt in the terminal and have the OS find file.txt? That is much more appropriate for the Application layer to resolve, rather than the filesystem.
Bugs like this come from over-engineering. Filesystems should be simple, and follow the principle of least surprise.
It bugs me to see foo.c and Foo.c as separate files in a directory listing. I like the fact that MacOS doesn't allow this situation to ever happen. Not taking on that problem means it's left to the user to figure out what's going on when similar glyphs occur.
> Fortunately it doesn't manifest that often, but it rears its head often enough to be a problem.
IIRC, one place where it does rear its head in when a file is renamed in a git commit to a value that downcases to the same value as the prior name. For example `Foo.txt`->`foo.txt`.
I have `core.ignorecase = true` in my `.gitconfig` for this very reason.
The extraordinarily frustrating case is where you're working on a repository that has multiple files that differ only by case. Git will check out one of them, then overwrite it with the other.
Debhelper used to be one of those until I convinced them to change it: they had a Debian/ directory for the Perl module Debian::Debhelper as well as a debian/ directory for the packaging metadata. https://bugs.debian.org/873043
(I suspect I'm a little unusual in wanting to have checkouts of Linux and Debhelper on my Mac homedir.)
If linux doesn't normalize unicode at all, can you have two different files that look like they are named `josé`, depending on if the é is decomposed or not?
Yes, for linux filenames are just bytes. Apart from / and NUL characters it doesnt care what you give it, nor does it mangle them anyway, its the only sane thing to do.
The only sane thing to do if you don’t care about how humans (as opposed to nerds) think.
In the end, the file system doesn’t exist in isolation, it is there to support users, and most of them won’t care how many bytes “é” takes to store.
Unix, by not even defining the way to interpret the bytes of file names (one can’t even assume that names consisting of only bytes that correspond to ASCII letters and digits should be interpreted as ASCII) makes it impossible to show file names to users. That’s insane.
> The only sane thing to do if you don’t care about how humans (as opposed to nerds) think.
At the FS layer, I think that's better. Makes things simpler for programs. For non-techie humans, unicode can be normalized at upper levels, like the GUI file manager or toolkit library that does save dialogs, etc.
That's if humans being confused because of lack of normalization of unicode is a real practical issue and not just something that can happen but never does.