MongoDB and DataStax, In the Rearview Mirror

_ondq · on June 25, 2014

My problems with Couchbase:

- Completely opaque administration. The company charges for support ($5k per production node per year!), so perhaps unsurprisingly the log output is unreadable (similar to a binary stacktrace). Forget about diagnosing cluster issues on your own.

- Bizarre failover behavior. In every test scenario I've tried, if any single node goes down my application becomes unresponsive (even if it can see other nodes that are up). This could be a problem with the .NET driver, but it's hard to tell.

- Weird performance. Usually Couchbase is fast, but sometimes my application will go into what looks like a busywait loop and consume 100% CPU while trying to contact the Couchbase cluster. Again, possibly (probably?) a driver issue, but it makes me hate dealing with Couchbase.

- Ten bucket limit. Ostensibly for performance reasons, but cannot be over-ridden. Even if I'm running a non-production cluster where performance isn't critical.

At this point we're looking at switching to Redis. At least it's widely used and easier to administer.

vosper · on June 25, 2014

Hi, thanks a lot for posting this! I have been thinking about using Couchbase Server for a mobile project because I like the mobile client component they have. I'm particularly concerned about the "opaque administration" issue - have you had to actually pay the company [1] for support because the log output is so bad, or did it just slow down your debugging process.

[1] I have nothing against paying for support, but this is a personal project

_ondq · on June 25, 2014

I don't have a problem with charging for support either, but Couchbase (for us) is just a memcache replacement, so that price is too high to justify.

Without support the "debugging process" consists of blindly trying random things to fix the problem before giving up and moving on to something else.

scalesolved · on June 25, 2014

The 'community' edition is free, can you run that in production? Agree on the crazy logs it generates.

_ondq · on June 25, 2014

We only run the community edition (everywhere). We could legally use the enterprise edition in QA, but it wouldn't make sense since production must be community.

shirlema · on June 25, 2014

(grain of salt: I work in the Cassandra Community team at DataStax)

If you are considering a scalable solution, you can use DataStax Enterprise for free in production if you are a startup (under $20M raised and under $2M in annual revs). Check here: http://www.datastax.com/what-we-offer/products-services/data...

vosper · on June 26, 2014

Thanks for pointing this out, I'm really looking at the document store aspect of Couchbase and the automatic syncing between mobile client and server, so I don't think Cassandra would help there.

jbellis · on June 25, 2014

Given that Couchbase is very aware of my criticism of the last time Thumbtack benchmarked C* vs Couchbase, it's significant that no mention is made of efforts to have both systems do durable writes. It looks like they made the same mistake a second time: http://www.datastax.com/dev/blog/how-not-to-benchmark-cassan...

(Update: the Couchbase pdf says that they used "the fastest consistency and durability modes available for the particular database" as cover for benchmarking different things... but even that wasn't done right; no mention is made of setting durable_writes=false in Cassandra.)

skjhn · on June 25, 2014

Well, writes are not durable until fsync. That's true for MongoDB, Cassandra and Couchbase Server. That being said, Cassandra demonstrated great write latency. The issue was read latency.

jbellis · on June 25, 2014

Of course they are not durable until fsync; that is why having an upper bound on how long that will be is important.

Out of the box Cassandra defaults to an upper bound of 10s. Couchbase defaults to no upper bound at all -- you can lose arbitrary amounts of data on power loss. That's a huge difference.

Since Couchbase does not support a time bound on fsyncs, there are two ways to make a fair comparison: make both systems fsync before acknowledging any write (commitlog_sync: batch in Cassandra and persistTo: master in Couchbase) or give Cassandra an unlimited durability window like Couchbase (durable_writes=false at the keyspace level). Of the two, the former is a lot more reasonable in real world scenarios, but the latter is at least more defensible than apples-to-oranges.

skjhn · on June 25, 2014

As can Cassandra. It depends on how much data is still in the page cache. I do agree both systems can be configured for immediate durability. However, we went with the default values as most people (and most databases) do not sync on every write. It is too much of a performance cost.

nemothekid · on June 25, 2014

> It is too much of a performance cost.

Its "MongoDB is web-scale" all over again.

jbellis · on June 25, 2014

Cassandra cannot lose more than 10s of data because it will slow down writes as necessary to make sure it does not. That's why comparing to a system that allows arbitrarily high data loss is unfair.

People can and do run Cassandra with full durability. When people understand the tradeoff between performance and data loss on power failure, you'd be surprised how often they'll chose real durability. (And by batching concurrent writes into the same fsync, the penalty isn't nearly as high as it would be in a naive implementation.)

skjhn · on June 25, 2014

Do you place a maximum amount of data in that 10s window? If not, it too is arbitrary. I do think batching concurrent writes is nice.

jbellis · on June 25, 2014

Of course "limit by time" and "limit by data" are interchangeable; in practice "limit by time" is easier for operators to reason about.

meritt · on June 25, 2014

"To deliver the highest write performance, Couchbase Server writes to an integrated, in-memory cache first. Then, it synchronizes the cache with the storage device for durability."

Comparing the write speed of non-durable writes against the write speed of durable writes is the same bullshit tactic that MongoDB first utilized a few years ago. Why don't we just add /dev/null in there too?

We're also apparently using 2 replicas however is the write being confirmed prior to the replica? From Couchbase documentation: "When a client application writes data to a node, that data will be placed in a replication queue and then a copy will be sent to another node. The replicated data will be available in RAM on the second node and will be placed in a disk write queue to be stored on disk at the second node."

So a write in Couchbase only guarantees it has been written to RAM on the 1st node. There's no fsync nor replica-writes occurring at the time of write_success=true

Dave_R · on June 25, 2014

So by default when the SDK returns 'true' for a write, yes it's just been written to RAM on the first (master) node. However you can if you desire 'observe' the write, which will return true when the write has hit disk and/or the RAM of the replica nodes (depending on what options you specify).

Ultimately it's a performance tradeoff - if you want sub microsecond writes for your workload then something has to give - certainly for many workloads (session stores, ad tracking, messaging queues) the tradeoff that you may loose the last few writes if you loose a node and it's replicas simultaneously is an acceptable one.

(Full disclosure: I work for Couchbase).

regularfry · on June 25, 2014

I'm intrigued as to how one can possibly justify calling that a "write".

skjhn · on June 25, 2014

Because the data is written to a data structure before the write completed. You can "write" to a cache or to an in-memory database. Databases, relational or NoSQL, write to memory first whether it is to an application managed cache or an OS managed cache (i.e. page cache).

skjhn · on June 25, 2014

That's right, and the same is true for MongoDB and Cassandra. They do not fsync on writes and they replicate asynchronously.

andrewvc · on June 25, 2014

Oh boy, this is rich. From the original source: It is always part of our process to invite vendors to provide configuration suggestions prior to testing and to share our methodology and preliminary results with each of them before we write conclusions. We will add any updates here should there be any before the final report is released. http://blog.thumbtack.net/new-benchmark-preliminary-results/

It's pretty irresponsible of Couchbase to post this on their blog given that statement. The benchmark is EXTREMELY limited in scope. Crucially, it's mostly about raw speed in a fairly artificial set of use cases. They only used one size of record for for christ's sake.

I'd say more, but I'll hold off till the final report.

nwenzel · on June 25, 2014

When are we going to be finished with raw speed tests? I get it that faster is better. But I've always been a believer that good design will get you farther than choosing one software/hardware/infrastructure over another.

Also, I get the comparison of MongoDB and Couchbase. But why Cassandra from DataStax? It's a completely different technology with entirely different strengths and weaknesses. There are certainly overlapping use cases, but it seems like an odd comparison.

The better comparisons would seem to be comparing Couchbase to vanilla CouchDB and Cloudant's version of CouchDB. I guess you could throw in MongoDB, but MongoDB and CouchDB have different strengths and weaknesses. Raw speed alone doesn't exactly leave any technology in the rearview mirror.

skjhn · on June 25, 2014

Performance is a reflection of architecture. Better performance, better architecture. That, and you can never downplay performance.

cnlwsu · on June 25, 2014

Really? ... Really? I didn't know my local file server that I SCP things too is better architected than S3.

skjhn · on June 25, 2014

Good point. After all, most interactive applications read and write data from local files via SCP and S3.

antirez · on June 25, 2014

Benchmarks are almost always not able to really provide a generally useful picture. When you see the actual database performance difference, is in your company, fighting for latency, in a given, specific use case, with a given writes durability and safety requirement. Every developer that really tried hard to optimize an application latency or performance knows how you end hitting the details, and very specific and database-dependent tradeoffs. TL;DR: pick databases after doing tests and simulations for your specific use case.

threeseed · on June 25, 2014

That's odd. I was under the impression that DataStax Java drivers had knowledge of the cluster and so weren't sending requests to random nodes.

http://www.datastax.com/doc-source/developer/java-apidocs/co...

rsvihla · on June 25, 2014

You actually have to use it though. TokenAware is one of many policies most of which can be layered on top of one another. Just depends on how they had the driver configured.

jbellis · on June 25, 2014

Token aware is the default now, but Thumbtack benchmarked with the two year old Thrift client instead.

ddispaltro · on June 25, 2014

I know the Astyanax driver from Netflix supports a "token aware" strategy. Here is a link to the docs:

https://github.com/Netflix/astyanax/wiki/Configuration

Depending on how the consistency level is set it could potentially save a "route" to the other host.

cbsmith · on June 25, 2014

Some drivers do. Some drivers don't. Before DataStax started providing their own driver, the default Cassandra driver didn't.

zorked · on June 25, 2014

You know what would be a nice twist, if database vendors added jepsen runs to their benchmarking mix.

Dave_Rosenthal · on June 25, 2014

FoundationDB did just that: http://blog.foundationdb.com/call-me-maybe-foundationdb-vs-j...

ChuckMcM · on June 25, 2014

That is pretty cool. I'm surprised that foundation doesn't get more press.

itamarhaber · on June 25, 2014

> ...while keeping most of the working set in RAM

So if the benchmark's about data being served from RAM, how come Redis isn't a part of it? #onlyasking

skjhn · on June 25, 2014

Most of the data is read from memory, not all of it. All of it is on disk too.

devanti · on June 25, 2014

Does anyone know if Couchbase can be used as a TB sized data warehouse? Seems a bit difficult since it seems all keys must be stored in memory.

danws6 · on June 25, 2014

At my previous company we experimented with a 12 node cluster (128GB per node) used as a datastore. At one point we had over 1 billion keys. This was back in the 1.x version where they persisted data into sqlite files so each file would have 100+ million rows in it. Persisting data took forever. Rebalancing took forever.

When it worked, it was very fast. When you lost a node, things went bad. The java client would lose it's mind trying to cope with an outage. We ended up writing a connection pool were we could just recreate our connections when we detected a node went out.

That said, it was the best distributed NoSQL solution we tried and might have improved a lot since I last used it.

devanti · on June 26, 2014

really helpful info. thanks!

skjhn · on June 25, 2014

It can and it is, but it might have to be scaled out.

http://www.couchbase.com/liveperson http://www.couchbase.com/paypal

devanti · on June 26, 2014

thanks!

thspimpolds · on June 25, 2014

OH HERE YOU GO, I found that impartiality you dropped on the floor

jamieb · on June 25, 2014

A response from Jonathan Ellis, via jancona: http://www.datastax.com/dev/blog/how-not-to-benchmark-cassan...

jbellis · on June 25, 2014

(That was a response to a different benchmark, also done by Thumbtack, but it looks like it applies just as well to this one. See my comment at https://news.ycombinator.com/item?id=7944226.)

rdtsc · on June 25, 2014

Just wanted to point out that this "we are writing to memory only" thing in Couchbase doesn't apply to CouchDB. CouchDB will fsync the data to the disk by default.

http://couchdb.readthedocs.org/en/latest/config/couchdb.html

You can then disable that if you don't care but it. So don't confuse Couchbase's policy with CouchDB (Couchdbase it seems has gone the way of Mongo here).

Also couchdb writes in append only mode and will let you crash restart your server without corrupting your data.

trhway · on June 25, 2014

i don't understand - Couchbase took 1ms per operation without durability configured, ie. in memory. What took so long? Or is network roundtrip included?

>However, MongoDB implements a single lock per database (link). MongoDB nodes are limited to executing a single write at a time per database.

that sounds like a nightmare. Are they for real?

Looking at the results - durable configs, MongoDB and DataStax, have 3ms latency with 25K and 75K ops/sec. Are they using HDDs or SSDs? As a reference point - 5 years ago I was hitting 3.5ms on 15K disks on Oracle, at 20-30K ops/second (durable).

>*Couchbase sponsored this study.

priceless.

skjhn · on June 25, 2014

Network roundtrip included.

Regarding MongoDB, they are. They say MongoDB 2.8 will have document level locking.

No durable configs in this benchmark though. All databases fsync'd after writes. The servers had SSDs.

mscook · on June 25, 2014

Why no units on the plots?

skjhn · on June 25, 2014

Good catch. Latency is in milliseconds. I'll have to fix that.

atonse · on June 25, 2014

Any mention of TokuMX?