Hacker News new | past | comments | ask | show | jobs | submit login

Filesystems lose on a big opportunity by supporting hardlinks. If every file had only one name, files on the same directory could be stored closely, much improving locality.



It's a tradeoff. On the other hand, you can more easily implement backup systems that copy directory trees, using hardlinks for files that haven't changed. For example.

I don't know the details of how specific filesystems are implemented, but it seems that if it's reasonable to achieve the locality you want, then it can be done for the first name a file has. Subsequent names wouldn't have good locality, but second and third links to a file are much less common. If you want other links to the file to have good locality, then simply make a copy, doubling your space requirements.

Hard links are useful, and you don't necessarily need to sacrifice locality in the common case, and in the uncommon case you can still choose between good locality, or good use of space and the sometimes useful semantics of two names by which to read or write a file.


The value of locality in this context assumes you use many of the files in the same directory at the same time, or enumerate a lot of directories and read the files. Your home directory has different access patterns than /lib or /bin.


That'd only really be true if a significant portion files were hardlinked. My root filesystem (Debian GNU/Linux testing) has 393,410 files, of which 825 have more than one link. My /home has 0 out of 156,273 files hardlinked.

Having 0.2% (or less) of files hardlinked shouldn't prevent storing files in the same directory near each other.


The point is that just the possibility of having more than one link kills your ability to assume that there is only one. It has nothing to do with whether or not they actually have more than one link.


Right, but this bumps into a spatial-locality analogue of Amdahl's law. If you optimise case that shows up 1% of the time, then you can only get a total gain of 1%.


No, the point is that you can't optimise the other 99% because that optimisation only works if hardlinks aren't allowed.


Why not? Choose parent at random and place the file near children of that parent. In 99% of cases it's the only parent, so you get the result you aimed for. Why would it be prevented by possible other names?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: