Hacker News new | past | comments | ask | show | jobs | submit login

I don't think the problem is that it's too slow.

I think the problem is that all sorts of utilities and commands break when dealing with hundreds of thousands of files in a single directory.

Also the block size means you'll waste an incredible amount of disk space.




I feel like wasting disk space is not a real issue any more: if you’re working with more than 1TB of data you (at least you better) have the resources to pay for bigger HDs which are at an almost trivial cost per TB.

Files in a single directory: It was discussed in another comment, but there’s a tried and true solution to that: simply nest your items in folders. For example with UUID as primary key you could have a folder structure of ‘(first 4 bytes)/(second four bytes)/(...so on)/(full uuid)’ where your nesting level is enough that you have no more than 50,000 files in each directory. For smaller pools you can reduce the layers (‘(first two bytes)/(full uuid)’ for example still gives you quite a few entries before any one folder gets to 50,000)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: