Hacker News new | past | comments | ask | show | jobs | submit login

I wonder... if they say "However, these files are allocated before MongoDB starts accepting connections. You therefore have to wait whilst the OS creates ~37 2GB files for the logs." and then just run:

    head -c 2146435072 /dev/zero > local.$i
Can it be lazily allocated by doing something like `open(); seek(2146435072); write("\0");`. Is a sparse file required to return "0" when reading empty places? (or does it just happen very often)



Several comments on the article suggest using sparse files. I suspect that this completely defeats the purpose of pre-allocating space. The initialization would be fast, but the time savings would return as a probably larger penalty amortized over the runtime in a non-obvious way. The only way to really know would involve performance testing the operation of a similarly sized database restored under both conditions on fresh drives.


> I suspect that this completely defeats the purpose of pre-allocating space.

There are two reasons to preallocate. One is so that you can get by with fdatasync rather than full fsync (that is, you don't have to sync file metadata as well, which is usually an extra seek; file length is the most commonly changed part of "metadata").

The other is to use mmap, since you can't change the size of an mmap'd file. This is the only part that mongodb cares about, since they never fsync.

There may be reasons to mmap a sparse file but I can't think of any.


There may be reasons to mmap a sparse file but I can't think of any.

It sounds like you understand the forces at work here, but I am confused by your statements. Do you think that "pre-allocating" a sparse file is a valid alternative to writing 2GB of zeros to disk in this case?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: