Notes from a production MongoDB deployment

rubyrescue · on Feb 28, 2010

If you suffer a power loss and/or MongoDB exits uncleanly then you will need to repair your database.... it involves MongoDB going through every single document and rebuilding it. When you have databases at the size of ours, this would take hours.

- that seems really scary

spudlyo · on March 1, 2010

Scary yet familiar. I work with many databases that have large MyISAM tables, which is essentially the same situation.

dmerr · on March 1, 2010

http://blog.mongodb.org/post/381927266/what-about-durability

jbellis · on March 1, 2010

Yes, and if you run a cluster long enough you WILL suffer unplanned power loss. Usually due to human error of one sort or another. It's basically impossible to eliminate entirely.

jcromartie · on Feb 28, 2010

I'm currently using MongoDB for a new system. I'm glad to see them reporting such a good experience at that scale. My deployment won't be anywhere near as large, so it's very encouraging. It will probably continue to run in a screen session like they say they did initially :).

viraptor · on Feb 28, 2010

I wonder... if they say "However, these files are allocated before MongoDB starts accepting connections. You therefore have to wait whilst the OS creates ~37 2GB files for the logs." and then just run:

    head -c 2146435072 /dev/zero > local.$i

Can it be lazily allocated by doing something like `open(); seek(2146435072); write("\0");`. Is a sparse file required to return "0" when reading empty places? (or does it just happen very often)

ynniv · on March 1, 2010

Several comments on the article suggest using sparse files. I suspect that this completely defeats the purpose of pre-allocating space. The initialization would be fast, but the time savings would return as a probably larger penalty amortized over the runtime in a non-obvious way. The only way to really know would involve performance testing the operation of a similarly sized database restored under both conditions on fresh drives.

jbellis · on March 1, 2010

> I suspect that this completely defeats the purpose of pre-allocating space.

There are two reasons to preallocate. One is so that you can get by with fdatasync rather than full fsync (that is, you don't have to sync file metadata as well, which is usually an extra seek; file length is the most commonly changed part of "metadata").

The other is to use mmap, since you can't change the size of an mmap'd file. This is the only part that mongodb cares about, since they never fsync.

There may be reasons to mmap a sparse file but I can't think of any.

ynniv · on March 1, 2010

There may be reasons to mmap a sparse file but I can't think of any.

It sounds like you understand the forces at work here, but I am confused by your statements. Do you think that "pre-allocating" a sparse file is a valid alternative to writing 2GB of zeros to disk in this case?

mikebo · on March 1, 2010

Very interesting that they have 17,810 collections (aka tables). I wonder if it is common w/ MongoDB to design a data model in this way? Anybody have more info on this tradeoff vs. a smaller but larger size collections?

dmytton · on March 1, 2010

I've commented on this here: http://blog.boxedice.com/2010/02/28/notes-from-a-production-...

Essentially this decision was due to the disk space requirements because of the way MongoDB allocates files for each DB.

There are no performance tradeoffs for using lots of collections. See http://www.mongodb.org/display/DOCS/Using+a+Large+Number+of+...

mikebo · on March 1, 2010

I'm actually not advocating for a database per customer. I was more wondering why you have so many collections -- surely they're not all representing different logical data. My guess is that you create some number of collections per customer?

With the new built-in sharding support, couldn't you have a 'customer_id' field and collapse many of those collections into larger collections, then shard by 'customer_id'? I'm just trying to understand tradeoff between these two types of schemas.