We have been using MongoDB in a production environment for almost a year and half now. It's great but there are some serious issues that you should know about.
1. writing locks the database. By database I mean the ENTIRE mongodb instance.
2. a query can only use 1 index at a time. You can create multi key indexes though.
3. replication is more complicated than it seems. We still have issues with the replication not working correctly when an instance fails.
4. A limited number of indexes per collection. I think the current limit is 64.
I agree that the biggest issue is #1. Another problems I've run into are:
- Operations are not automatically aborted if the client closes the connection. Suppose a case where the client is sending many queries that are queued by the server (usually waiting for a lock to be released). Client times out, re-establishes a connection, retries, etc. Mongo will execute the operations even if the client is no longer waiting for the response. Operations are not automatically aborted.
- Sending slaveOk() queries need to be sent to the slaves explicitly by the driver, and I haven't seen it work reliably. Ideally you'd send them to mongos and let it send slaveOk() queries automatically to the slaves. All the drivers would be simpler and just work. But this is not the case. You can't do it unless sharding is enabled. Each driver needs to implement this functionality.
But I must say that overall my experience has been very positive. It's very developer friendly. You can go from not knowing anything about it to having a working prototype in an hour or so. The flexibility in querying the system is awesome.
You just need to understand the limitations of the system. Once you start pushing the limits it gets significantly more complicated.
Writing does lock the database (or at least the shard), but if the page is in RAM, that means you're locked for the duration of a write to memory, which is inconsequential. The problem comes when you try to write to a page that's not resident. In that case, you can end up (worst case) having to write a dirty page to disk to free up a slot, load the page you want to write, and then write it.
This can be really time-consuming, so one "fix" is to retrieve a document before writing it (thus guaranteeing it will be resident in RAM). More recent versions of MongoDB (2.0 on IIRC) also try to yield the write lock before faulting on write to avoid this problem (though it doesn't work 100% of the time).
Oh, and one other thing to be aware of is that anything that uses the Javascript engine in MongoDB is going to use the spidermonkey global interpreter lock, so you probably want to avoid those things if performance is a concern ($where, .group(), .mapreduce(), etc.)
I guess on an extremely high traffic site running with a relatively small amount of RAM you could cycle through the whole LRU cache in the VM between operations, but I'd expect the probability to be vanishingly small.
It's worth noting that the Spidermonkey engine does not have a global interpreter lock, but it may be true that MongoDB uses SpiderMonkey in a way similar in effect.
True, but since the current version of SM that we use (1.7) isn't thread-safe all calls to it need to be inside of a mutex to prevent trashing global state. It is possible that with a switch to V8 or upgrade to the lastest SM (1.8.5?) it is possible to execute JS without a GIL, however that is something that will need extensive testing to make sure that it works properly.
PS - It is important to note that the jslock is completely separate from the dblock and it is rare to hold both simultaneously. This means that if you are running multiple Map Reduces, one can be fetching or writing data to the DB while the other is processing the objects in JS.
Writing to the journal is actually done outside of the write lock most of the time. It does hold a readlock, however as of 2.0 most commits will release the lock before doing any disk I/O.
I've been using MongoDB with a custom ad server I built over the summer that is hitting 10,000 impressions per second at peak ties. This is all on one box and it hasn't broken a sweat yet. Because of that positive experience, we have decided to use MongoDB for all future projects instead of MySQL (we make multiplayer games).
I've been using MongoDB for about 5 months now and can say that it is an excellent tool. I've been tinkering with building a very fast search engine for my latest app by combining MongoDB with ElasticSearch and it's been a wonderfully pain-free experience. While I certainly believe in picking the right tool for the job, I don't see myself going back to SQL anytime soon.
It's really straight-forward for simple usage. There's no official river for mongoDB, so for now, since I've got very small documents, I just save in MongoDB, and then index an even smaller searchable document in ElasticSearch. My front-end talks to ElasticSearch through a simple node/expressjs route, which was made far simpler by this NPM module -->
I apologize for using the term "combining" as it's slightly misleading. I actually use ElasticSearch for search functionality only. I use MongoDB for the standard DB stuff (read, update, etc...)
I'd like to hear more about your usage of MongoDB and ElasticSearch. I am building something where it makes sense to use both on the Play! framework, which has modules for both.
When 10gen finally push out the lock per collection changes, it will improve mongodb significantly. We have a couple of collections that are heavy write, and a couple of others that are heavy read, and we can't run them on the same instance due to lock contention - we have to run two separate mongod processes, because the global lock kills performance.
As long as you understand that issue, you can work around it, and there are future changes coming that will address it so that will be fantastic
I REALLY want to try mongo for a side project I am building, however the single server durability is an issue. My data could easily fit in mysql/postgres however I am storing data that fits better in a document store. I am inclined to just use postgres for now because it's sort of silly to set up/pay for multiple servers for such a small project.
UPDATE: Apparently my information is out of date. Hopefully my confusion (and the answers below) helps someone else. Thanks.
MongoDB has single server durability since 1.8 with the journal. If you put the journal on an SSD, you can even get it almost for free performance-wise.
Yeah, I guess I should make the point that if you don't put the journal on SSD (and you only need a few gigs of SSD to journal terabytes of spinning disk storage), you will see significant slowdown.
Only thing I don't like with MongoDB/Node is that it requires a VPS and can't be deployed on a shared hosting environment. Or is there a company that can host MongoDB/Node combination at an affordable price today?
I know this isn't exactly answering your question, but if you deploy on an Amazon EC2 instance, you can use MongoHQ's shared hosting in conjunction to your pre-existing Node provider. MongoHQ runs on the AWS infrastructure, so the latency between your Node server and MongoHQ is insignificant (we use EngineYard with MongoHQ and the latency for a simple small read is anywhere between 1-5ms).
Have you tried Heroku? I've been using Heroku and MongoLab's plugin, all for free. It can get expensive after lots of data, but until then, it is all free.
You can do it with Heroku if you don't mind using MongoLab or MongoHQ. There's also the option of spinning up a free 128mb Node.js SmartMachine at http://no.de.
1. writing locks the database. By database I mean the ENTIRE mongodb instance. 2. a query can only use 1 index at a time. You can create multi key indexes though. 3. replication is more complicated than it seems. We still have issues with the replication not working correctly when an instance fails. 4. A limited number of indexes per collection. I think the current limit is 64.