Hacker News new | past | comments | ask | show | jobs | submit login

No filesystem is great in the lots-of-small-files case, partly simply due to syscall overhead.

There's a reason https://www.sqlite.org/fasterthanfs.html , SquashFS, etc. are a thing, or why even Europe's fastestest supercomputer's admins admonish against lots of small files. https://docs.lumi-supercomputer.eu/storage/#about-the-number...






SquashFS is a (read only) filesystem designed for small files.

Which shows that even if you don't want to call any filesystem great here, they differ vastly in just how bad they handle small files. Windows' filesystems (and more importantly it's virtual filesystem layer including filters) are on the extremely slow end of this spectrum.


The reason LUMI is advising against many files is that it uses the Lustre parallel filesystem, which is notoriously bad with small files. See here: https://www.lanl.gov/projects/national-security-education-ce... .

I guess that's where the "sqlite competes with fopen" part might help.

That used to be ReiserFS's claim to fame - tail packing small files. Doesn't seem like any FS has really optimized small file handling since then.

It is not so much the wasted space that bugs me as all the metadata and processing of that metadata.

For instance, the 70,000 files in your node_modules do not need separate owner and group fields (they all belong to the same user under normal conditions) and are all +rw for the files and +rwx for the directories. If you have access to one you have access to all and if your OS is access checking each file you are paying in terms of battery life, time, etc.

On Windows it is the same story except the access control system is much more complex and is slower. (I hope that Linux is not wasting a lot of time processing the POSIX ACLs you aren’t using!)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: