Hacker News new | past | comments | ask | show | jobs | submit login

Technically NTFS allows those too. The filesystem, being a very low-level tool, hardly thinks of the upper layers and what pain it might inflict there. Its purpose is to store blobs under a name and retrieve them upon request. Since a char[] (or wchar_t[]) looks enough like a name that's what it uses.

That being said, enforcing such restrictions in upper layers brings pain as well, because suddenly you can have files that you cannot delete anymore (happens sometimes on Windows).




True; there's no reason that the filesystem should be storing anything other than char[]. The filesystem is a serialized domain, and char[] buffers are for storage and retrieval of serialized data. But that also means that each filesystem should explicitly specify a serialization format for what's stored in that char[] -- hopefully UTF-8.

However, the filesystem should really be where that serialized representation begins and ends. The filesystem should be interacting with the VFS layer using runes (Unicode codepoints), not octets.

And then, given that all filesystems route through the VFS, it can (and should) be enforcing preconditions on those runes in its API, expecting users to pass it something like a printable_rune_t[]. (Or even, horror of Pascalian horrors, a struct containing a length-prefixed printable_rune_t[].)

And for the situation where there's now files floating around without a printable_rune_t[] name -- this is why NTFS has been conceptually based around GUIDs (really, NT object IDs) for a decade now, with all names for a file just being indexed aliases. I wonder when Linux will get on that train...


Well, history sadly dictates that the interface to the upper layers it based around code units because those have always been fixed-length. Unicode came to late to most operating systems to really be ingrained in their design and where it was (Windows springs to mind) it all got a turn for the worse with the 16-to-21-bit shift in Unicode 2 with Unicode-by-default systems being no better than 8-bit-by-default systems had been a decade earlier.

That NTFS uses GUIDs internally to reference streams is news to me, though. But I think on Unix-like systems the equivalent would be inodes, I guess, right?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: