Have you noticed that the problem isn't MongoDB? Every case I've seen has been t...

thibaut_barrere · on March 31, 2012

> Every case I've seen has been the data traffic exceeding the capacity of the system. There is no database that works well under those circumstances.

Is there a benchmark somewhere comparing the memory/disk consumption of MongoDB vs. other datastores?

If there's a significant overhead (and my early tests tend to show that there was - but I didn't make a strict benchmark though), then it would become very related to MongoDB then.

(honest and real question, I'm a MongoDB user btw, as well as Redis, MySQL, Postgresql etc).

rogerbinns · on March 31, 2012

The main overhead for MongoDB's storage is that the "column name" (keys) is stored in every record rather than just once as with traditional SQL database and some of the other NoSQL solutions. That is why you'll often see developers using very short key names, and one use for an "ORM" to translate between developer friendly names and short stored names.

Of course this can be solved fairly easily by the MongoDB developers by having a table mapping between short tokens/numbers and the long names. This is the ticket:

https://jira.mongodb.org/browse/SERVER-863

This is someone's measurements with different key names:

http://christophermaier.name/blog/2011/05/22/MongoDB-key-nam...

thibaut_barrere · on April 1, 2012

Thanks for the links.

My question goes further though, as someone who has worked with, and implemented too, column-based stores: I'm curious to compare the respective space/ram consumption for the data part, too.

I think I'll write such a benchmark one day :)

willvarfar · on March 31, 2012

As I remember it:

The journaling is only fsynced every so-often - its not like it magically gives you anything like the D in ACID.

This leaves you relying on replication for durability.

And everyone has problems when they lose one of the cluster.

Luckily it wasn't the lot: http://blog.empathybox.com/post/19574936361/getting-real-abo...

I would love to be corrected; we'd all sleep easier.

rogerbinns · on March 31, 2012

How about the actual facts:

http://www.mongodb.org/display/DOCS/Journaling

So yes it is periodic by default (in the millisecond range). However you can wait on any request until it has become durable.

As for replication, people seem to have some hate for that, but the reality is that a journalled system that has failed (any database/operating system) will take a large amount of time to come back up, replay/recover journals etc. Not that useful.