Hacker News new | past | comments | ask | show | jobs | submit login
MongoDB rocks my world (pythonisito.com)
77 points by meghan on Oct 31, 2011 | hide | past | favorite | 34 comments



We have been using MongoDB in a production environment for almost a year and half now. It's great but there are some serious issues that you should know about.

1. writing locks the database. By database I mean the ENTIRE mongodb instance. 2. a query can only use 1 index at a time. You can create multi key indexes though. 3. replication is more complicated than it seems. We still have issues with the replication not working correctly when an instance fails. 4. A limited number of indexes per collection. I think the current limit is 64.


I agree that the biggest issue is #1. Another problems I've run into are:

- Operations are not automatically aborted if the client closes the connection. Suppose a case where the client is sending many queries that are queued by the server (usually waiting for a lock to be released). Client times out, re-establishes a connection, retries, etc. Mongo will execute the operations even if the client is no longer waiting for the response. Operations are not automatically aborted.

- Sending slaveOk() queries need to be sent to the slaves explicitly by the driver, and I haven't seen it work reliably. Ideally you'd send them to mongos and let it send slaveOk() queries automatically to the slaves. All the drivers would be simpler and just work. But this is not the case. You can't do it unless sharding is enabled. Each driver needs to implement this functionality.

But I must say that overall my experience has been very positive. It's very developer friendly. You can go from not knowing anything about it to having a working prototype in an hour or so. The flexibility in querying the system is awesome.

You just need to understand the limitations of the system. Once you start pushing the limits it gets significantly more complicated.


Writing does lock the database (or at least the shard), but if the page is in RAM, that means you're locked for the duration of a write to memory, which is inconsequential. The problem comes when you try to write to a page that's not resident. In that case, you can end up (worst case) having to write a dirty page to disk to free up a slot, load the page you want to write, and then write it.

This can be really time-consuming, so one "fix" is to retrieve a document before writing it (thus guaranteeing it will be resident in RAM). More recent versions of MongoDB (2.0 on IIRC) also try to yield the write lock before faulting on write to avoid this problem (though it doesn't work 100% of the time).

Oh, and one other thing to be aware of is that anything that uses the Javascript engine in MongoDB is going to use the spidermonkey global interpreter lock, so you probably want to avoid those things if performance is a concern ($where, .group(), .mapreduce(), etc.)


> This can be really time-consuming, so one "fix" is to retrieve a document before writing it (thus guaranteeing it will be resident in RAM).

On a high traffic site does this guarantee that it will be in RAM? Couldn't it get swapped out pretty easily if you have a lot of read traffic?


I guess on an extremely high traffic site running with a relatively small amount of RAM you could cycle through the whole LRU cache in the VM between operations, but I'd expect the probability to be vanishingly small.


It's worth noting that the Spidermonkey engine does not have a global interpreter lock, but it may be true that MongoDB uses SpiderMonkey in a way similar in effect.


True, but since the current version of SM that we use (1.7) isn't thread-safe all calls to it need to be inside of a mutex to prevent trashing global state. It is possible that with a switch to V8 or upgrade to the lastest SM (1.8.5?) it is possible to execute JS without a GIL, however that is something that will need extensive testing to make sure that it works properly.

PS - It is important to note that the jslock is completely separate from the dblock and it is rare to hold both simultaneously. This means that if you are running multiple Map Reduces, one can be fetching or writing data to the DB while the other is processing the objects in JS.


In your best and worst case scenarios for the lock you forgot that MongoDB will also need to write a journal entry by default (1.9.2+).


Writing to the journal is actually done outside of the write lock most of the time. It does hold a readlock, however as of 2.0 most commits will release the lock before doing any disk I/O.


These all seem like serious issues. Why even use Mongo in the first place?


#1 is why they have asynchronous writes, even though that doesn't really solve the problem for some (most?) use cases


MongoDB is blazingly fast sometimes. You might try something a little different and find it's 50 times slower.


> 1. writing locks the database. By database I mean the ENTIRE mongodb instance.

That's the showstopper for me. Frankly I'm surprised so many people manage to find uses for Mongo with this restriction.


I've been using MongoDB with a custom ad server I built over the summer that is hitting 10,000 impressions per second at peak ties. This is all on one box and it hasn't broken a sweat yet. Because of that positive experience, we have decided to use MongoDB for all future projects instead of MySQL (we make multiplayer games).


Nice data point. What is the rest of the stack?


Linux, Nginx, PHP


I've been using MongoDB for about 5 months now and can say that it is an excellent tool. I've been tinkering with building a very fast search engine for my latest app by combining MongoDB with ElasticSearch and it's been a wonderfully pain-free experience. While I certainly believe in picking the right tool for the job, I don't see myself going back to SQL anytime soon.


It's really straight-forward for simple usage. There's no official river for mongoDB, so for now, since I've got very small documents, I just save in MongoDB, and then index an even smaller searchable document in ElasticSearch. My front-end talks to ElasticSearch through a simple node/expressjs route, which was made far simpler by this NPM module -->

https://github.com/rgrove/node-elastical

I apologize for using the term "combining" as it's slightly misleading. I actually use ElasticSearch for search functionality only. I use MongoDB for the standard DB stuff (read, update, etc...)


I'd like to hear more about your usage of MongoDB and ElasticSearch. I am building something where it makes sense to use both on the Play! framework, which has modules for both.


How did you combine MongoDB with elasticsearch? Is this open-sourced somewhere?


When 10gen finally push out the lock per collection changes, it will improve mongodb significantly. We have a couple of collections that are heavy write, and a couple of others that are heavy read, and we can't run them on the same instance due to lock contention - we have to run two separate mongod processes, because the global lock kills performance.

As long as you understand that issue, you can work around it, and there are future changes coming that will address it so that will be fantastic


I REALLY want to try mongo for a side project I am building, however the single server durability is an issue. My data could easily fit in mysql/postgres however I am storing data that fits better in a document store. I am inclined to just use postgres for now because it's sort of silly to set up/pay for multiple servers for such a small project.

UPDATE: Apparently my information is out of date. Hopefully my confusion (and the answers below) helps someone else. Thanks.


MongoDB has had single server durability since 1.8. It's now on 2.0 and enabled by default.


afaik single server durability was a problem in the past. Since 1.8 they've introduced journaling and it's now enabled by default in 2.0 More info here: http://blog.mongodb.org/post/381927266/what-about-durability and here: http://www.mongodb.org/display/DOCS/Durability+and+Repair


MongoDB has single server durability since 1.8 with the journal. If you put the journal on an SSD, you can even get it almost for free performance-wise.


Yeah, I guess I should make the point that if you don't put the journal on SSD (and you only need a few gigs of SSD to journal terabytes of spinning disk storage), you will see significant slowdown.


You could say that about any database, though - nothing special about mongo's journaling that makes it "for free" wit h an SSD.


Only thing I don't like with MongoDB/Node is that it requires a VPS and can't be deployed on a shared hosting environment. Or is there a company that can host MongoDB/Node combination at an affordable price today?


dotCloud runs a fully replicated MongoDB along with your entire application - no need for a separate add-on provider.

http://docs.dotcloud.com/services/mongodb/


There are a number of MongoDB shared hosts listed at http://www.mongodb.org/display/DOCS/Hosting+Center that you might want to check out.


I know this isn't exactly answering your question, but if you deploy on an Amazon EC2 instance, you can use MongoHQ's shared hosting in conjunction to your pre-existing Node provider. MongoHQ runs on the AWS infrastructure, so the latency between your Node server and MongoHQ is insignificant (we use EngineYard with MongoHQ and the latency for a simple small read is anywhere between 1-5ms).

Make sure that the regions match up though! :D


Have you tried Heroku? I've been using Heroku and MongoLab's plugin, all for free. It can get expensive after lots of data, but until then, it is all free.


You can do it with Heroku if you don't mind using MongoLab or MongoHQ. There's also the option of spinning up a free 128mb Node.js SmartMachine at http://no.de.


you can also try cloudfoundry from vmware (http://cloudfoundry.com/). It is currently in beta and free




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: