I wonder... if they say "However, these files are allocated before MongoDB start...

ynniv · on March 1, 2010

Several comments on the article suggest using sparse files. I suspect that this completely defeats the purpose of pre-allocating space. The initialization would be fast, but the time savings would return as a probably larger penalty amortized over the runtime in a non-obvious way. The only way to really know would involve performance testing the operation of a similarly sized database restored under both conditions on fresh drives.

jbellis · on March 1, 2010

> I suspect that this completely defeats the purpose of pre-allocating space.

There are two reasons to preallocate. One is so that you can get by with fdatasync rather than full fsync (that is, you don't have to sync file metadata as well, which is usually an extra seek; file length is the most commonly changed part of "metadata").

The other is to use mmap, since you can't change the size of an mmap'd file. This is the only part that mongodb cares about, since they never fsync.

There may be reasons to mmap a sparse file but I can't think of any.

ynniv · on March 1, 2010

There may be reasons to mmap a sparse file but I can't think of any.

It sounds like you understand the forces at work here, but I am confused by your statements. Do you think that "pre-allocating" a sparse file is a valid alternative to writing 2GB of zeros to disk in this case?