Hacker News new | past | comments | ask | show | jobs | submit login

We have been using MongoDB in a production environment for almost a year and half now. It's great but there are some serious issues that you should know about.

1. writing locks the database. By database I mean the ENTIRE mongodb instance. 2. a query can only use 1 index at a time. You can create multi key indexes though. 3. replication is more complicated than it seems. We still have issues with the replication not working correctly when an instance fails. 4. A limited number of indexes per collection. I think the current limit is 64.




I agree that the biggest issue is #1. Another problems I've run into are:

- Operations are not automatically aborted if the client closes the connection. Suppose a case where the client is sending many queries that are queued by the server (usually waiting for a lock to be released). Client times out, re-establishes a connection, retries, etc. Mongo will execute the operations even if the client is no longer waiting for the response. Operations are not automatically aborted.

- Sending slaveOk() queries need to be sent to the slaves explicitly by the driver, and I haven't seen it work reliably. Ideally you'd send them to mongos and let it send slaveOk() queries automatically to the slaves. All the drivers would be simpler and just work. But this is not the case. You can't do it unless sharding is enabled. Each driver needs to implement this functionality.

But I must say that overall my experience has been very positive. It's very developer friendly. You can go from not knowing anything about it to having a working prototype in an hour or so. The flexibility in querying the system is awesome.

You just need to understand the limitations of the system. Once you start pushing the limits it gets significantly more complicated.


Writing does lock the database (or at least the shard), but if the page is in RAM, that means you're locked for the duration of a write to memory, which is inconsequential. The problem comes when you try to write to a page that's not resident. In that case, you can end up (worst case) having to write a dirty page to disk to free up a slot, load the page you want to write, and then write it.

This can be really time-consuming, so one "fix" is to retrieve a document before writing it (thus guaranteeing it will be resident in RAM). More recent versions of MongoDB (2.0 on IIRC) also try to yield the write lock before faulting on write to avoid this problem (though it doesn't work 100% of the time).

Oh, and one other thing to be aware of is that anything that uses the Javascript engine in MongoDB is going to use the spidermonkey global interpreter lock, so you probably want to avoid those things if performance is a concern ($where, .group(), .mapreduce(), etc.)


> This can be really time-consuming, so one "fix" is to retrieve a document before writing it (thus guaranteeing it will be resident in RAM).

On a high traffic site does this guarantee that it will be in RAM? Couldn't it get swapped out pretty easily if you have a lot of read traffic?


I guess on an extremely high traffic site running with a relatively small amount of RAM you could cycle through the whole LRU cache in the VM between operations, but I'd expect the probability to be vanishingly small.


It's worth noting that the Spidermonkey engine does not have a global interpreter lock, but it may be true that MongoDB uses SpiderMonkey in a way similar in effect.


True, but since the current version of SM that we use (1.7) isn't thread-safe all calls to it need to be inside of a mutex to prevent trashing global state. It is possible that with a switch to V8 or upgrade to the lastest SM (1.8.5?) it is possible to execute JS without a GIL, however that is something that will need extensive testing to make sure that it works properly.

PS - It is important to note that the jslock is completely separate from the dblock and it is rare to hold both simultaneously. This means that if you are running multiple Map Reduces, one can be fetching or writing data to the DB while the other is processing the objects in JS.


In your best and worst case scenarios for the lock you forgot that MongoDB will also need to write a journal entry by default (1.9.2+).


Writing to the journal is actually done outside of the write lock most of the time. It does hold a readlock, however as of 2.0 most commits will release the lock before doing any disk I/O.


These all seem like serious issues. Why even use Mongo in the first place?


#1 is why they have asynchronous writes, even though that doesn't really solve the problem for some (most?) use cases


MongoDB is blazingly fast sometimes. You might try something a little different and find it's 50 times slower.


> 1. writing locks the database. By database I mean the ENTIRE mongodb instance.

That's the showstopper for me. Frankly I'm surprised so many people manage to find uses for Mongo with this restriction.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: