Retrospective from Postmark on outages (MongoDB)

megaman821 · on March 31, 2012

I know I shouldn't care but I am always weary of doing business with any company using MongoDB.

10gen put out claiming MongoDB was so much faster than SQL solutions but it seemed obvious to me than turning off fsync would make the SQL solutions run at about the same speed. Plus, why would I want to run my database in mode where it is easy to lose data? MongoDB may be a useful product but their marketing is deceptive, which will lead to companies using it in inappropriate situations.

rogerbinns · on March 31, 2012

Have you noticed that the problem isn't MongoDB? Every case I've seen has been the data traffic exceeding the capacity of the system. There is no database that works well under those circumstances. Another way of putting it is that these businesses have been sufficiently successful that they have outgrown their own planning and deployment.

As for your second paragraph, MongoDB has had journalling for quite a while, so you can make you writes durable and limited to the speed of your storage.

thibaut_barrere · on March 31, 2012

> Every case I've seen has been the data traffic exceeding the capacity of the system. There is no database that works well under those circumstances.

Is there a benchmark somewhere comparing the memory/disk consumption of MongoDB vs. other datastores?

If there's a significant overhead (and my early tests tend to show that there was - but I didn't make a strict benchmark though), then it would become very related to MongoDB then.

(honest and real question, I'm a MongoDB user btw, as well as Redis, MySQL, Postgresql etc).

rogerbinns · on March 31, 2012

The main overhead for MongoDB's storage is that the "column name" (keys) is stored in every record rather than just once as with traditional SQL database and some of the other NoSQL solutions. That is why you'll often see developers using very short key names, and one use for an "ORM" to translate between developer friendly names and short stored names.

Of course this can be solved fairly easily by the MongoDB developers by having a table mapping between short tokens/numbers and the long names. This is the ticket:

https://jira.mongodb.org/browse/SERVER-863

This is someone's measurements with different key names:

http://christophermaier.name/blog/2011/05/22/MongoDB-key-nam...

thibaut_barrere · on April 1, 2012

Thanks for the links.

My question goes further though, as someone who has worked with, and implemented too, column-based stores: I'm curious to compare the respective space/ram consumption for the data part, too.

I think I'll write such a benchmark one day :)

willvarfar · on March 31, 2012

As I remember it:

The journaling is only fsynced every so-often - its not like it magically gives you anything like the D in ACID.

This leaves you relying on replication for durability.

And everyone has problems when they lose one of the cluster.

Luckily it wasn't the lot: http://blog.empathybox.com/post/19574936361/getting-real-abo...

I would love to be corrected; we'd all sleep easier.

rogerbinns · on March 31, 2012

How about the actual facts:

http://www.mongodb.org/display/DOCS/Journaling

So yes it is periodic by default (in the millisecond range). However you can wait on any request until it has become durable.

As for replication, people seem to have some hate for that, but the reality is that a journalled system that has failed (any database/operating system) will take a large amount of time to come back up, replay/recover journals etc. Not that useful.

cnagele · on March 31, 2012

Hi everyone. Chris here from Postmark.

I would say that MongoDB durability used to be an issue, but now with journaling and replica sets it's not as much of a concern.

There were two reasons why the secondary was less capable than the primary. First, the data had become very fragmented due to our frequent purging. And second, we were in the middle of an upgrade to our servers and that one had not been tackled yet. The primary failure came at a bad time. I could have clarified that better in the post.

Regarding capped collections, yes they are faster. The problem is that they can't be sharded. With our dataset that would not allow us to scale.

viraptor · on March 31, 2012

I get the impression that one of the biggest issues was missed. They did not test the standard load against the secondary server, they assigned a machine of lower specs to the task and there's nothing in the future actions that indicates they'll change it... Even if they go for the new and shiny, they can end up in the same situation when their master fails.

I hope they just overlooked that in the blog post, rather than actually not correcting this first.

steve8918 · on March 31, 2012

That's my takeaway as well.

I have no opinions on MongoDB, but it really seems like this particular problem was because they skimped on disaster recovery, ie. their failover hardware was less powerful than their production hardware. That was the root cause of their downtime, which is inadequate planning.

That's spending money on car insurance, but realizing only after you get into an accident that the car insurance covers almost nothing. It means you've wasted your money paying for the insurance. They paid for the secondary failover hardware, but it was effectively useless since they were down for 2 days. The only thing it mitigated, possibly, was how long they were down for, but the primary objective of the hardware, ie. keep them up in case of a disaster, was a complete failure.

I've worked at a company that was completely down for a day worldwide due to a "disaster", even though we had spent millions on diesel fuel generators, etc. I blame the "checkbox" mentality where people only look to satisfy requirements, but no one actually has ownership over the process and the details. Unfortunately, in my case, no one got fired over this complete misstep, which is another problem... zero accountability.

viraptor · on March 31, 2012

Seems like Netflix's chaos monkey is not a bad idea actually. I don't mean you have to kill your services randomly while there are users on them... but switching from your master to secondary (why are you even making a distinction anyway?) should be a pretty standard operation.

Even normal upgrades (hardware fails - it's a question of when, not if) could be handled transparently just by making the "secondary" server a first-class citizen.

mtkd · on March 31, 2012

It's also advisable to try and keep the core dataset (which you absolutely depend on) as light as possible.

Split out the heavy stuff on to other servers.

Then have emergency flags in the webapps so you can run them in a low feature mode. If you bake this concept in when you're building the webapps dealing with drama is much less stressful.

calpaterson · on March 31, 2012

This seems like such an obvious issue that I can't understand how they overlooked it. If you are using failover as a strategy, your failover machine has to be up to the task.

ninjastar99 · on March 31, 2012

Really tried to love Postmark for about 3 months now. Constant, almost daily issues forced us to unfollow them on Twitter (it literally became an annoyance seeing issues daily). Then we switched to Mailgun last week, and we are very happy. +1 to Mailgun

brainless · on March 31, 2012

Are they using MongoDB for the wrong purposes? The task they have seems similar to logging and if so aren't there much better software to do that? A logging server perhaps?

rogerbinns · on March 31, 2012

MongoDB does have what it calls capped collections - the semantics are the same as a circular log. No idea why they don't use that storage format.

In any event this problem is one of success. That is the kind of problem I prefer having.

reustle · on March 31, 2012

So the primary was much beefier than the secondary? If so, why let it fail over? I figured you should always have the same resources for a mongo primary and secondary.

kermatt · on March 31, 2012

Two is one. One is none.

This includes equality in failover systems. The common issue in all of these cases is not the DB engine, OS, stack, whatever, but the infrastructure.