Hacker News new | past | comments | ask | show | jobs | submit login

This is an excellent point and I wholeheartedly agree!



Is it? That would require any update to any file to cascade into a bunch of directory updates amplifying the write and for what? Do you “du” in your shell prompt?

Not to mention it would likely be unable to handle the hardlink problem so it would consistently be wrong.


> That would require any update to any file to cascade into a bunch of directory updates amplifying the write and for what?

You can be a little lazy about updating parents this and have O(1) update and O(1) amortized read with O(n) worst case (same as now anyway).


This is probably the right solution, but tou need to rebuild on an unclean unmount if you do it lazily.


Disks have improved in I/O and write speed metrics substantially to the point where windows will literally index your file system so you can search faster, and antivirus will scan files in the background before you open them. I don’t think maintaining size state on directories would be all that much of a challenge.


I expect performance would suffer quite a lot. In a system with high I/O, there would be a lot of contention on updating the size of such directories as /home or /tmp, let alone /.

Also, are you going to update a file’s size for every write (could easily be a thousand times if you’re copying over a 10MB file) or are you going to coalesce updates to file sizes? If the latter, how do you recover after a crash?

Virtual directories such as /dev and /proc would require special-casing.

Mounting and unmounting disks probably would require special-casing.


Haven’t many similar issues been solved in journaled file systems and/or things like database transaction logs and indexes? Real-time high precision accuracy is not required, knowing how big a directory is, is a frequent use case of directories. Hell, ‘df’ tracks this at the partition level, including edge cases, as does ‘du’


As far as I am aware, neither of those cascade sizes up.

Also, doing that in databases isn’t a solved problem. count(*) can be slow in databases. See for example

- PostgreSQL: https://dba.stackexchange.com/questions/314371/count-queries..., https://wiki.postgresql.org/wiki/Count_estimate

- Oracle: https://forums.oracle.com/ords/apexds/post/select-count-very...

(Both databases use MVCC (https://en.wikipedia.org/wiki/Multiversion_concurrency_contr...) to ensure that concurrent queries all see the database in a consistent state. That makes it necessary to visit each row and check their time stamp when counting rows)


I have a "du" command currently running that has been running for ~50 hours. I'd much rather have it update a half-dozen directory entries on each write.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: