Things I wish I knew about MongoDB a year ago

dia80 · on Oct 17, 2012

Genuine question:

In what use cases does mongo kick mysql's ass?

I've used it a couple of times in hobby projects and enjoyed not maintaining a schema. I read so many of these 'gotcha' style articles and for example one commenter here wants to have a manual "recently dirty" flag to combat the master / slave lag mentioned in the article. I know it's faster (tm) but once you have to take in to account all this low level stuff you have to worry about yourself wouldn't it just be better to rent/buy another rack of mysql servers and not worry about it?

Look forward to learning something...

thibaut_barrere · on Oct 17, 2012

MongoDB kicks ass in the following situations (real projects I did as a freelancer):

- dealing with semi-structured input (forms with some variability) and storing as a document, all while being able to query across the data

- used as a store to provide very flexible ETL jobs (with ability to upsert, filter/query, geonear etc)

For those situations, I would definitely use MongoDB again. As a RDBMS replacement, I wouldn't use it today.

olalonde · on Oct 18, 2012

For those like me who have no ides what ETL stands for: https://en.wikipedia.org/wiki/Extract,_transform,_load

thibaut_barrere · on Oct 18, 2012

Thanks for pointing that out!

Here is a presentation (slides + video) I gave about a Ruby ETL, for instance. It illustrates the typical use cases I run into.

http://lanyrd.com/2012/rulu/swxtt/

jandy · on Oct 17, 2012

To elaborate on the semi-structured input point: Monogo and it's kin are great for EAV systems (http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%...), where your entities can have an arbitrary number of fields (often user defined). Trying to build this kind of system in a traditional RDBMs can be quite tricky.

thibaut_barrere · on Oct 18, 2012

Thanks for elaborating! This is much needed and spot on :-)

A long while back I built a somewhat complex survey app: I can confirm it's fairly more complicated to handle with a RDBMs, compared to a document store.

sgolestane · on Oct 17, 2012

Since we are a mysql shop, for this use-cases I serialize and store the form as xml in a CLOB column. For any field that needs to be searched on, I create an additional column. Disadvantage of this is you can't run mysql queries against the data stored in the CLOB column.

thibaut_barrere · on Oct 18, 2012

Adding column per searchable field is precisely what I wanted to avoid :-)

If you need to store multiple forms types, it gets hairy very fast.

MongoDB allows to query inside the data (which is not a blob in that case).

jeltz · on Oct 18, 2012

So does PostgreSQL, but I have no idea about MySQL. You can run xpath queries against XML documents, and you can also index specific xpath queries (there is no general indexing infrastructure for XML in PostgreSQL). For JSON no path lookup functions are included in the core yet, but I assume there are extensions which add them.

dude_abides · on Oct 17, 2012

To slightly rephrase the OP's question:

  In what use cases does mongo kick postgres's ass?

To the two points you mentioned:

- semi-structured input can be saved as hstore type or as json type;

- and for flexible jobs, you can use pretty much any popular language - PL/R, PL/Python, even PL/C if performance is really critical.

thibaut_barrere · on Oct 18, 2012

I would have replied something similar if that was the question :-) (I use PG a lot these days).

Agreed on the first point (but I'm not sure you get exactly the same type of flexibility in all my use cases - I'll have to make a closer comparison).

For the second point, well not having to handle the schema for ETL jobs is sometimes fairly useful and removes a lot of cruft, that was part of my point (those ETL are code-based, only relying on MongoDB as a flexible store).

d0ugal · on Oct 18, 2012

You can't query JSON easily and hstore is only one level deep. So, no, its not as flexible.

niels · on Oct 18, 2012

You can't query the json type as easily in postgres. I would guess that is an important use case.

jalada · on Oct 18, 2012

I don't know about JSON type but the annoying thing about hstore is everything is a string; there are no types.

taligent · on Oct 18, 2012

Just because you can write an app using assembler doesn't mean it is the best way to do it.

dude_abides · on Oct 18, 2012

Very tempted to ask "In what way does postgres compare to an assembler and mongodb to a high level language?" but I think I'll just assume that you're trolling.

Besides, SQL sounds more high-level than map-reduce to me.

joevandyk · on Oct 17, 2012

fwiw, postgresql 9.2 supports a json datatype. you can efficiently access/query json fields using plv8 (http://code.google.com/p/plv8js/).

so you could have:

  create table form_results (                                                                                
    id serial primary key,                                                                            
    data json                                                                                         
  );

http://pgeu-plv8.herokuapp.com/ has more information.

jvehent · on Oct 18, 2012

It's just a JSON syntax validator. You can't index on part of the json, postgres treats it as a string.

You can apply full text search on it, but that doesn't tell you if you're matching on a key or a value.

ismarc · on Oct 18, 2012

In postgres, you can index on a stored procedure, so you can use a simple stored procedure written in JavaScript to look up by key and return values, etc. Example available at http://people.planetpostgresql.org/andrew/index.php?/archive...

vidarh · on Oct 18, 2012

Very interesting presentation, though the title of "heralding the death of nosql" is either intentionally exaggerating, or indicates the author doesn't understand all the reasons why people go to nosql databases. In fact, the presentation demonstrates why: Postgres has tons of really fantastic, awesome features that next to nobody uses because they are hidden behind layers of SQL-type incantations and/or require various extensions.

gbog · on Oct 18, 2012

So you'd rather install a completely new storage engine and learn to use it than check a doc and install an extension to postgres?

thibaut_barrere · on Oct 18, 2012

The thing is that 10gen did a really, really good job at polishing the install process and documenting it to get people started.

No surprise to see the GIS part of MongoDB is built-in instead of an extension of some kind. I know a couple of people who used PG without even knowing there was a GIS extension.

jeltz · on Oct 18, 2012

But on the other hand PostGIS being independent from PostgreSQL has resulted in the best opensource GIS database. And with the recent addition of CREATE EXTENSION the PostgreSQL extension installation process has been heavily streamlined. Before CREATE EXTENSION it was a mess for larger extensions.

thibaut_barrere · on Oct 18, 2012

PostGIS is miles, miles ahead, for sure! Yet even "create extension" sounds a bit weird to most newcomers (me included at first!), especially against "built-in basic GIS".

ibotty · on Oct 18, 2012

is it a good thing to have everything built in?

amalag · on Oct 18, 2012

I don't quite understand how the transition happened from no-json support to json support in those slides. How did the plv8js do this searching on fields?

alexro · on Oct 17, 2012

IMO there is no rational explanation to this phenomena other than: people are different. Some get bored with stored procs and want same hassle but in another form.

morsch · on Oct 17, 2012

A cynical but insightful description of many kinds of progress and change. ;)

taligent · on Oct 18, 2012

Yes. There is no rational explanation for people using the right tool for the right job.

Let me guess. You would build a skyscraper using a trough and cement in a bucket ?

jamesaguilar · on Oct 18, 2012

Are you trying to give us an idea of what it's like to build an even moderate sized app without transactional consistency and with mapreduce? Because that is what it sounds like to me.

elchief · on Oct 17, 2012

When you have so many writes that sharding isn't enough.

When you make changes so fast you need a liquid schema.

When you want to make your boss learn map-reduce so he can query the data.

When the application can take care of integration and not the database itself.

leif · on Oct 18, 2012

This will be obvious, but I work at Tokutek.

> When you have so many writes that sharding isn't enough.

TokuDB does indexed insertions very fast. [1] We've even plugged ourselves in underneath MongoDB (just for fun) and we beat them too. [2]

> When you make changes so fast you need a liquid schema.

TokuDB supports lots of schema changes with zero downtime. [3]

> When you want to make your boss learn map-reduce so he can query the data.

Can't help you there, but I can make you not need to torture your boss that way.

> When the application can care of integration and not the database.

What if it didn't need to?

We also get fantastic compression [4], retain full transactional semantics, and lots of other fun stuff.

Email us if you're curious!

[1]: http://www.tokutek.com/resources/benchmark-results/benchmark...

[2]: http://www.tokutek.com/2012/08/10x-insertion-performance-inc...

[3]: http://www.tokutek.com/resources/benchmark-results/benchmark...

[4]: http://www.tokutek.com/resources/benchmark-results/benchmark...

DanWaterworth · on Oct 18, 2012

In my experience, it's not just throughput that's important, but also 99th percentile latency. If I understand fractal trees correctly, you sometimes need to rewrite all of your elements on disk. How do you do this without causing lag?

leif · on Oct 18, 2012

That's happily not how fractal trees work, at all. We have a few talks online describing how they work. Zardosht has one here http://vimeo.com/m/26471692 . I thought Bradley had a more detailed one at MIT but I can't find it right now. (EDIT: found it! http://video.mit.edu/watch/lecture-19-how-tokudb-fractal-tre...)

Basically, I think you're thinking of the COLA. What we implement does have a literal tree structure, with nodes and children and the whole thing, so at any point you're just writing out new copies of individual nodes, which are on the order of a megabyte. At no point do we have to rewrite a large portion of the tree, so there aren't any latency issues.

DanWaterworth · on Oct 18, 2012

Thank you very much for the links. Is my understanding correct?

Essentially, a fractal tree is a B-Tree (or perhaps a B+Tree?) with buffers on each branch (per child). Operations get added to these buffers and when one becomes full, the operations get passed to the corresponding child node. Operations are applied when they reach the node that is responsible for the data concerned.

leif · on Oct 18, 2012

That's about it! It's like a B+Tree in that the data resides in the leaves (well, and in the buffers), except that the fanout is a lot lower because we save room in the internal nodes for the buffers.

alex137 · on Oct 19, 2012

Hi. I thought the block size is much bigger in fractal trees (like 4MiB instead of 4KiB) than in B-trees hence the fanout would be about the same?

I'm trying to experiment with these ideas on my side, but can't quite grok how large the buffers must be at each level of the tree.

Let's take a concrete example like: 2^40 (1T) records with an 8-byte key and an 8-byte value. In a traditional B-tree with a 8KiB block size and assuming links take 8 bytes too and assume for a while that blocks are completely full. So, that's 1G leaf blocks, a fanout of about 1K and hence & full 4-level trees: 1 root block, 1K level-2 blocks, 1M level-3 blocks, 1G leaf blocks.

In this setting, I understand that a fractal tree will attach a buffer to each internal node. How large will they be at level 1 (root), level 2, and level 3 ?

leif · on Oct 19, 2012

The buffers have the same capacity at each level of the tree. It doesn't have to be that way, and you can find different strategies in the literature, but that's the way we do it.

The degree of each internal node is way lower, because we want to use the space in internal nodes for buffers more than for keys. In a 16TB tree, we would probably have a tree of height 6 or 7.

What we restrict is the size of a whole internal node, so that any one buffer in it can get much more full than the others if it needs to. We want to pick a node size that makes sense for your media. 4MB is a good compromise for spinning disks between their typical bandwidth and seek time, and it's large enough to give the garbage collector a rest on SSDs.

alex137 · on Oct 22, 2012

OK, I'm starting to get the concept. Here is how I'm understanding the whole thing:

Assume a node size that can hold 1M records in a buffer and 32 links (that's about 16 MiB more or less, more than what you suggest, but it simplifies the explanations).

Assume further that to operate on a node, you read it entirely in memory, modify it, sort it, then write it entirely on disk and that that takes about 1 second. This is pessimistic as there are surely more efficient ways to maintain sorted 1M nodes.

Now consider completely random insertions or 16-byte records and assume that the keys will spilt completely uniformly among the subtrees.

1. Every 1M insertions, you need to spill from level-1 (root buffer) to level-2, which generates about 32 IOs 2. Additionnally, every 32M insertions, you need to spill all level-2 buffers to level-3, which generates 3232 IOs n. every 32^(n-1) M.insertions, you need an additional 32^n IOs to spill from level-n to level-(n+1)

So, all in all, to insert 32^n M.records, you need a total of (n+1)32^(n+1) I/Os, which means 32(n+1) IO per 1M insertions.

For example, in a 16TiB data store, i.e. with n=8, you need 288 IOs per million insertions, or about 2400 random insertions/second at 1s/random IO.

In comparison, a B-Tree with a fanout 1024 and a block size 16KiB would generate about 4 random IOs per insertion. These IOs would be dominated by the seek time (~10ms), so you could get only 25 insertions per second.

On the other hand, a point query in this B-Tree would take 4 IOs instead of 8 random I/Os in the fractal tree. So, that's the tradeoff: much better insertion rate vs. slightly worse query time.

Note that a small fanout (about square root of the equivalent B-tree fanout) is important. If you take 1024 as fanout in the fractal tree, the tree has half the height, but the number of IOs per 1M insertions drops to 10245 instead of 32*9.

Have I got it right?

leif · on Oct 22, 2012

The argument I like goes like this:

Flushing one buffer one level down costs O(1) I/Os. It moves B elements, so flushing one element one level down costs O(1/B) amortized I/Os. The tree has fanout O(1), so it has height O(log N). Therefore, each element must be flushed O(log N) times before it reaches its target leaf. So the cost to get one element down to its leaf is O((log N)/B) amortized I/Os.

Compare this with a B-tree, which costs O(log_B N) = O((log N)/(log B)) per insertion.

I'm pretty sure your math is right, though it attacks it from the other direction and I've only just skimmed it. It seems sound though.

The point query tradeoff is there in the theory, you're right, but it assumes a cold cache. In practice, most people have enough RAM for all their internal nodes, whether they're using a B-tree or a fractal tree, so either way it's typically 1 I/O per random point query in either data structure.

Be careful with asterisks here. :)

alex137 · on Oct 23, 2012

Good.

Now, what is the recommanded layout of the node buffers (the 4MiB buffers). I've read about Packed Memory Arrays, but don't quite get the complete idea. Can we really do better than read, merge, write ?

(A pity there's no preview button on this forum...)

leif · on Oct 23, 2012

Unsorted is fine, we're just counting I/Os, and it takes O(1) I/Os to read or write a buffer regardless of the layout.

leif · on Oct 19, 2012

I should add that, to reduce the cost of a flush by a pretty decent constant number of I/Os, you keep one buffer per child in each node. Basically, you want to do the sorting work on arrival into the node, not during the flush.

DanWaterworth · on Oct 18, 2012

I'm a little confused. I thought fractal trees worked cache obliviously or am I mistaken?

leif · on Oct 18, 2012

In theory, it is cache oblivious, and the CO-DAM model informs our decisions about the implementation, but no, the implementation itself isn't actually cache oblivious. Shh, don't tell on us.

A "fractal tree" is defined by our marketing team as "whatever it is we actually implement." If you want to talk cache-oblivious data structures, we can talk about things like the COLA and cache-oblivious streaming B-trees which have rigorous definitions in the literature. At some point, if you want to achieve a certain level of detail, you have to pick one or the other in order to continue the conversation.

stinkypete · on Oct 21, 2012

Sounds like a Y-Tree to me: "A Novel Index Supporting High Volume Data Warehouse Insertions," C. Jermaine et. al, VLDB '99.

How are fractal trees different?

leif · on Oct 22, 2012

Is that the same as Willard's Y-fast tree? If so, then no they're not very similar. If not then I'm unfamiliar and I should read that paper.

stinkypete · on Oct 24, 2012

Nope, not the same. From the descriptions you give above it is very similar (a modified B-tree with heaps of delayed insertions in internal nodes), though the analysis in the paper does not use CO techniques.

This tweak to B-trees to support fast insertions works so well I am really surprised it has not become more common in practice. The only downside I can think of is that flushes down the tree when internal buffers get large may cause lower-level pages to become overfull, but if you allow more than one flush operations at each level you can avoid this problem. Of course, that's only important if you are paranoid about internal nodes actually fitting in a page; if you don't care about that, then the worst-case analysis picture looks much better since a flush operation causes at most one flush in the pages immediately below it in the tree structure.

leif · on Oct 24, 2012

I was surprised too, when I learned this technique, that people weren't already doing it. I think it's just an age thing. B-trees are old so they have all the kinks worked out, at least in mature implementations. Fractal trees are new enough that there are still a bunch of tradeoffs to play with and evaluate.

When you actually go and implement the system, you find all these behaviors that really aren't expressed in the theory. There are lots of things we're still experimenting with. Flushing isn't too hard, if a flush makes another node way too big, you can just flush that one too. We don't allow our cascading flushes to fan out, to prevent latency-style problems, but they can flush all the way down to a leaf. There are some other fun problems around query strategies and concurrency control too. It's definitely a fun structure to play with.

lwat · on Oct 17, 2012

> When you have so many writes that sharding isn't enough.

Does Mongo still have a global write lock?

elchief · on Oct 18, 2012

no longer (but only a few months)

mgummelt · on Oct 18, 2012

Though I believe it still has a per-collection lock.

Jare · on Oct 18, 2012

It has per-database locking. Per-collection is being worked on but no planned date yet. [1]

[1] http://www.mongodb.org/display/DOCS/How+does+concurrency+wor...

diminoten · on Oct 17, 2012

Mongo isn't so important to this question as ODB vs RDBMS. Here's some light reading:

http://en.wikipedia.org/wiki/Object-relational_impedance_mis...

MongoDB is just ODB, and MySQL is just RDB.

Besides, postgres is the real future!

dleblanc · on Oct 17, 2012

Not sure what Wu-Tang has to do with this, but....

Seriously though, is Mongo an ODB, or a document oriented database? ODBs/OODBs imply a much different use-case and functionality, and I think we ought not conflate the two.

ww520 · on Oct 18, 2012

Mongo is not an ODB in the traditional ODB sense. Mongo is just a document (or blob) database. ODB supports relationship, link traversal, inheritance, etc. Basically the whole shebang of the OO notions in a database.

diminoten · on Oct 19, 2012

You're right, I didn't know there was a difference.

taligent · on Oct 18, 2012

PostgreSQL is the PAST and it will forever remain there until it solves its biggest problems.

It is the hardest out of all databases I've used to cluster, replicate, shard and manage. And the database world is moving towards scaling horizontally rather than vertically. I can push a button in say CouchDB to replicate and shard. Try doing that in PostgreSQL.

jeffdavis · on Oct 18, 2012

"And the database world is moving towards scaling horizontally rather than vertically."

I think that's an oversimplification. Two counterpoints:

* Database systems will always need to make heavy use of locality. The speed of light means that synchronizing access over long distances (even medium distances -- light only goes about a foot per nanosecond) will always be a challenge.

* Multi-core means vertical scaling is back in (if by "vertical scaling" you mean "scaling on one box"), and probably for a while.

Postgres is doing an excellent job at both.

I do agree with the less-exaggerated point that postgres really needs to improve its multi-machine scaling. But it's far from a solved problem on any system under discussion.

andyzweb · on Oct 18, 2012

postgres does horrible with multicore unless you have many concurrent pooled connections. fork-and-forget

jeffdavis · on Oct 18, 2012

I think you're referring to postgres's inability to use more than one core per query, which is true (or mostly true... there are quite a few helper processes that take on some of the work).

For many smaller queries, postgres does great on multi-core, and pgbouncer is a good connection pooler.

true_religion · on Oct 18, 2012

People still use SQLlite. Why? Because its suits a specific set of circumstances.

Postgres does the same---its an incredibly powerful database with a large suit of features that make application development both easy and sane. To the vast majority of projects where you won't outgrow a single server, it makes sense to use it.

np422 · on Oct 18, 2012

You have obviously never used Oracle RAC .. :)

jeremyjh · on Oct 18, 2012

One issue with MySQL in large databases is that schema changes are extremely expensive, so much so that you'll be making design decisions around it (e.g. how do we implement this feature without executing our two-day alter table statement). Not all RDBMS have this problem to such a degree but none can escape it entirely.

A lot of the gotchas that he notes are related to design trade-offs with different default behavior than an RDBMS typically would have. For example as your system gets large enough in MySQL you may find you have to do asynchronous replication as well, and then you will have similar problems with dirty reads.

kermatt · on Oct 18, 2012

> e.g. how do we implement this feature without executing our two-day alter table statement?

With MySQL being so prevalent, IMO this is the principal reason why the concept that RDBMS' are hard to work with exists. None of the other major platforms have this issue, but if a developer's only exposure is to MySQL, then the idea of schema alters are always fearsome.

This is not to say that ALTER TABLE statements are always quick - if constraint checks or default values are included, there can be long runtimes in PostgreSQL, MSSQL, Oracle, etc. - but significant downtime to add an empty nullable column is just plain stupid for a platform that has been around as long as MySQL.

MySQL has forced a major population segment of developers to toss the advantages of DB side validation and rich query languages for the purpose of _avoiding_MySQL_. Sure the whole object-relational impedance mismatch exists, SQL is hard, yada-yada, but I have never seen these reasons cause as much angst as taking a service offline to add a column to a table.

10gen should have a picture of Monty in their CFO's office.

mrkurt · on Oct 17, 2012

Eventually consistant replication is not unique to MongoDB, most DBs have an async replication option. Using "not Mongo" won't really solve it.

krichman · on Oct 18, 2012

An option is not a default, though, so using Postgres or MySQL absolutely would solve the problem of returning with success before persisting the data.

The contentious area isn't really that Mongo does this, it's because for whatever reason the people trying it don't expect Mongo to do this.

untog · on Oct 17, 2012

I think there are legitimate reasons to use a "NoSQL" solution rather than MySQL. I'm more interested to know in what use cases Mongo kicks it's competitors asses? What are it's competitors, even? I'll admit that the NoSQL world is a slightly blurry mess to me, with different products seemingly optimised for different cases.

taligent · on Oct 18, 2012

MongoDB is a JSON document store. Almost all other NoSQL are key/value type stores.

If you have a use case that centres around storing document style data then MongoDB will be better suited.

trustfundbaby · on Oct 18, 2012

not to be persnickety but its a BSON document store. Theres a difference.

mattparlane · on Oct 17, 2012

My number one reason for choosing MongoDB is replication that just works out of the box and doesn't require either a read lock or shutting down the master to set up a new slave.

sausagefeet · on Oct 17, 2012

As far as I can tell, Riak blows mongo away for this particular criteria.

mattparlane · on Oct 18, 2012

That's my understanding too, but MongoDB hit the sweet-spot for us, Riak couldn't handle the raw queries-per-second we needed without requiring extra hardware ($$).

jaimebuelta · on Oct 18, 2012

But Riak is much limited on the queries you can run.

jeltz · on Oct 18, 2012

Serious question: What database requires read locks or shutting down the master for setting up a new slave? Is it MySQL?

Both sounds like really weird requirements for replication which defeats half the purpose of having it. PostgreSQL has never had any of these two problems. Still working on improving usability but with the addition and improvements of pg_basbackup I would say it is almost there. Hopefully 9.3 will get timeline switching to simplify failover.

mrinterweb · on Oct 17, 2012

Map reduce across sharded servers comes to mind as an advantage. For that matter, horizontal scalability in general is a big advantage that many of the NoSQL data stores have over RDBMS databases.

tomschlick · on Oct 17, 2012

I'm so glad this wasn't another case of someone just ranting about using mongo for the wrong purpose and being mad about it a year later.

trafficlight · on Oct 17, 2012

I also appreciate how he pointed out positive things that he just wasn't aware of initially.

nickzoic · on Oct 17, 2012

The count({condition}) one is a worry. I'm guessing it is slow in the case where it has to page the index in in order to count it. I wonder if it is still a problem where the index is used a lot anyway. A fix in MongoDB would seem a lot better solution than having everyone implement their own hacky count-caching solution.

EDIT: Actually, looking at the bug reports, sounds like maybe lock contention on the index?

The master/slave replication problem seems bad but I think it can be worked around (for my particular project) with a flag on the user session ... if they've performed a write in the last 30 seconds, set slaveOkay = false. Users who are just browsing may experience a slight delay in seeing new documents but users who are editing stuff will see their edits immediately.

lars512 · on Oct 17, 2012

The inconsistent reads in replica sets is something we've come across with MySQL read slaves as well. I think it's a gotcha of that whole model of replication, rather than a MongoDB-specific issue.

mgummelt · on Oct 17, 2012

I'm not aware of any database that solves this problem. Is there one? As far as I know, mysql reads must be distributed to the slaves at the application level, which has no knowledge of master/slave inconsistency. I suppose the time delta between master and slave can be queried, but that still doesn't protect from race conditions/inconsistent reads. This is actually why we chose to only utilize slaves for data redundancy rather than read throughput at my last company. Inconsistent reads weren't tolerable.

orthecreedence · on Oct 17, 2012

Riak does. You say, when writing, "please don't return until this data is replicated on 2 servers." And when reading, "please only return a successful read if this data is read from 2 servers."

So you have R = 2, W = 2, R+W = 4, and if your replication (N) val is 3, you're fine (you're always going to get consistency if R+W > N).

Riak is cool.

varworld · on Oct 17, 2012

Riak is cool and what you have described is correct however under failure conditions[1] you may not get this desired behavior.

[1]http://docs.basho.com/riak/latest/references/appendices/conc...

achompas · on Oct 17, 2012

I believe Cassandra does as well, not 100% sure though.

ash211 · on Oct 17, 2012

Cassandra does, you can write with a write consistency of W and read with a read consistency of R, and as long as their sum is greater than the replication factor (number of copies to store across the cluster) you have consistent reads. W + R > N.

http://wiki.apache.org/cassandra/ArchitectureOverview#line-1...

mgummelt · on Oct 17, 2012

Does Riak support distributed transactions? If not, I don't see how they handle the possibility of a read occuring during the replicated write.

sausagefeet · on Oct 17, 2012

No, Riak is nontransactional. But neither is mongo if it's important here. Riak is apparently getting some kind of strong consistency though. Calvin looks interesting for a nosql with distributed transactions.

mgummelt · on Oct 18, 2012

This is also something we desperately needed at my last company, but we couldn't find FOSS that supported it. mysql offers XA, but I've heard mixed reviews.

splix · on Oct 17, 2012

MongoDB has such feature (maybe its depends on driver, but at least JVM drivers have - http://api.mongodb.org/java/2.9.1/com/mongodb/WriteConcern.h...).

As for me, it's mostly the quesion of perfomance, and application architecture, most time you don't want to wait until it's replicated to slaves.

mgummelt · on Oct 17, 2012

As far as I can tell, WriteConcerns don't protect from inconsistent reads in all cases. It looks like the most conservative setting is Majority, but even then there is no assurance that reads won't occur during the replication, nor that they won't occur to one of the minority of non-replicated servers.

mrkurt · on Oct 17, 2012

You can set it to the total number of slaves you have and ensure data is on all of them. Normally that slows writes down enough that it's undesirable, though.

arielweisberg · on Oct 18, 2012

Shameless plug (hey Tokutek is doing it), in VoltDB replication is synchronous so it doesn't have this problem.

Latency in the current version is nothing to write home about, but in V3 latency with replication is 600-1000 microseconds. Group commit to disk is every 1-2 milliseconds.

V3 also allows reads to be load balanced across replicas and masters so you gain some additional read capacity from replication. V3 also routes transactions directly to the node with the data so you don't use capacity forwarding transactions inside the cluster.

You get to keep transactions to. Now go figure out what you don't get to keep ;-)

lmm · on Oct 18, 2012

>Now go figure out what you don't get to keep ;-)

Cross-datacenter replication becomes a Really Bad Idea?

arielweisberg · on Oct 18, 2012

You don't have to give up cross DC replication if you do it asynchronously, but you lose cross shard consistency when there is a dirty fail over. This effects distributed transactions and series of single part transactions that depend on each other across different shards.

What Volt supports right now is actually asynchronous replication that does preserve cross shard consistency, but that is not going to last.

You can do synchronous multi-DC replication, but then you have Spanner and the associated latency of multiple data-center quorums.

There is also Calvin http://bit.ly/RGW9RY

kermatt · on Oct 18, 2012

Any platform that supports synchronous replication?

http://www.postgresql.org/docs/9.1/static/warm-standby.html#...

nerfhammer · on Oct 17, 2012

This is called semi-synchronous replication

http://dev.mysql.com/doc/refman/5.5/en/replication-semisync....

jbert · on Oct 17, 2012

One way to resolve it is to mark that user or session (or even just request) "sticky to the master" for long enough to cover your normal replication delay.

When we saw it before, ensuring that a given request which issued a write also read from the master was sufficient. (sub-second replication delay).

mgummelt · on Oct 17, 2012

This may help in the majority of the cases, but many applications also can't tolerate inconsistent reads across users/sessions.

nevinera · on Oct 17, 2012

>Range queries are indexed differently

If I'm reading your description right, this is hardly mongo-specific. Try it in mysql, for example:

(index is [:last, :first])

  select first from names 
  where last in ('gordon','holmes','watson')
  order by first;

An index is an ordering by which a search may be performed - to illustrate, the index for my small table looks pretty much like this:

  gordon, jeff
  holmes, mycroft
  holmes, sherlock
  watson, john

Unless the first key is restricted to a single value, it can't order by the second key without performing at least a merge-sort. They aren't in that order in the index.

foobar2k · on Oct 18, 2012

He never said it was mongo specific

nevinera · on Oct 20, 2012

>Things I wish I knew about MongoDB a year ago

The post reads as a series of criticisms about mongo. I don't love mongo, but I'm not aware of any data store that can perform that type of query purely from an index.

Now, the description was vague enough that he could have been describing a real bug I'm not aware of - at one point I've seen MySQL decide to use an index for sorting instead of for filtering when that query plan was 500x slower. If mongo has a bug like that one, disregard my comment please. :-)

jameswyse · on Oct 18, 2012

One thing I love MongoDB for is it's geospatial indexing abilities: http://www.mongodb.org/display/DOCS/Geospatial+Indexing

Was a really nice surprise when I was building a location based web app.

jsemrau · on Oct 18, 2012

That was our use-case as well. And it works fine for this but just in the application layer. We are not using Mongo for data storage (at least we are not trusting it to hold it for long)

wakaflaka · on Oct 18, 2012

chris123 · on Oct 18, 2012

Is MongoDB more marketing hype than quality product? I've heard it before and this article seems to point in that direction as well.

kokey · on Oct 18, 2012

I think it's generally full of gotchas similar to that of SQL databases like MySQL and Oracle. In fact, most of the issues mentioned in this article, like delayed replication, indexed queries and using 'explain' are issues I've had to deal with in MySQL and Oracle. Most of these databases are fine out of the box for small scale use, but when you scale up you have to deal with these 'gotchas' like indexing, partitioning, bulk loading, and having to profile everything etc.

lmm · on Oct 18, 2012

Yes, but only because it has ~infinite marketing hype.

It isn't and shouldn't be a general replacement for a RDBMS; it makes some interesting sacrifices for performance that you have to understand before using it. But it is very much a quality product; it makes some easy things very easy and some very hard things possible.

bengaoir · on Oct 18, 2012

I wish I knew that it sucked.