- Completely opaque administration. The company charges for support ($5k per production node per year!), so perhaps unsurprisingly the log output is unreadable (similar to a binary stacktrace). Forget about diagnosing cluster issues on your own.
- Bizarre failover behavior. In every test scenario I've tried, if any single node goes down my application becomes unresponsive (even if it can see other nodes that are up). This could be a problem with the .NET driver, but it's hard to tell.
- Weird performance. Usually Couchbase is fast, but sometimes my application will go into what looks like a busywait loop and consume 100% CPU while trying to contact the Couchbase cluster. Again, possibly (probably?) a driver issue, but it makes me hate dealing with Couchbase.
- Ten bucket limit. Ostensibly for performance reasons, but cannot be over-ridden. Even if I'm running a non-production cluster where performance isn't critical.
At this point we're looking at switching to Redis. At least it's widely used and easier to administer.
Hi, thanks a lot for posting this! I have been thinking about using Couchbase Server for a mobile project because I like the mobile client component they have. I'm particularly concerned about the "opaque administration" issue - have you had to actually pay the company [1] for support because the log output is so bad, or did it just slow down your debugging process.
[1] I have nothing against paying for support, but this is a personal project
We only run the community edition (everywhere). We could legally use the enterprise edition in QA, but it wouldn't make sense since production must be community.
Thanks for pointing this out, I'm really looking at the document store aspect of Couchbase and the automatic syncing between mobile client and server, so I don't think Cassandra would help there.
Given that Couchbase is very aware of my criticism of the last time Thumbtack benchmarked C* vs Couchbase, it's significant that no mention is made of efforts to have both systems do durable writes. It looks like they made the same mistake a second time: http://www.datastax.com/dev/blog/how-not-to-benchmark-cassan...
(Update: the Couchbase pdf says that they used "the fastest consistency and durability modes available for the particular database" as cover for benchmarking different things... but even that wasn't done right; no mention is made of setting durable_writes=false in Cassandra.)
Well, writes are not durable until fsync. That's true for MongoDB, Cassandra and Couchbase Server. That being said, Cassandra demonstrated great write latency. The issue was read latency.
Of course they are not durable until fsync; that is why having an upper bound on how long that will be is important.
Out of the box Cassandra defaults to an upper bound of 10s. Couchbase defaults to no upper bound at all -- you can lose arbitrary amounts of data on power loss. That's a huge difference.
Since Couchbase does not support a time bound on fsyncs, there are two ways to make a fair comparison: make both systems fsync before acknowledging any write (commitlog_sync: batch in Cassandra and persistTo: master in Couchbase) or give Cassandra an unlimited durability window like Couchbase (durable_writes=false at the keyspace level). Of the two, the former is a lot more reasonable in real world scenarios, but the latter is at least more defensible than apples-to-oranges.
As can Cassandra. It depends on how much data is still in the page cache. I do agree both systems can be configured for immediate durability. However, we went with the default values as most people (and most databases) do not sync on every write. It is too much of a performance cost.
Cassandra cannot lose more than 10s of data because it will slow down writes as necessary to make sure it does not. That's why comparing to a system that allows arbitrarily high data loss is unfair.
People can and do run Cassandra with full durability. When people understand the tradeoff between performance and data loss on power failure, you'd be surprised how often they'll chose real durability. (And by batching concurrent writes into the same fsync, the penalty isn't nearly as high as it would be in a naive implementation.)
"To deliver the highest write performance, Couchbase Server writes to an integrated, in-memory cache first. Then, it synchronizes the cache with the storage device for durability."
Comparing the write speed of non-durable writes against the write speed of durable writes is the same bullshit tactic that MongoDB first utilized a few years ago. Why don't we just add /dev/null in there too?
We're also apparently using 2 replicas however is the write being confirmed prior to the replica? From Couchbase documentation: "When a client application writes data to a node, that data will be placed in a replication queue and then a copy will be sent to another node. The replicated data will be available in RAM on the second node and will be placed in a disk write queue to be stored on disk at the second node."
So a write in Couchbase only guarantees it has been written to RAM on the 1st node. There's no fsync nor replica-writes occurring at the time of write_success=true
So by default when the SDK returns 'true' for a write, yes it's just been written to RAM on the first (master) node. However you can if you desire 'observe' the write, which will return true when the write has hit disk and/or the RAM of the replica nodes (depending on what options you specify).
Ultimately it's a performance tradeoff - if you want sub microsecond writes for your workload then something has to give - certainly for many workloads (session stores, ad tracking, messaging queues) the tradeoff that you may loose the last few writes if you loose a node and it's replicas simultaneously is an acceptable one.
Because the data is written to a data structure before the write completed. You can "write" to a cache or to an in-memory database. Databases, relational or NoSQL, write to memory first whether it is to an application managed cache or an OS managed cache (i.e. page cache).
Oh boy, this is rich. From the original source: It is always part of our process to invite vendors to provide configuration suggestions prior to testing and to share our methodology and preliminary results with each of them before we write conclusions. We will add any updates here should there be any before the final report is released. http://blog.thumbtack.net/new-benchmark-preliminary-results/
It's pretty irresponsible of Couchbase to post this on their blog given that statement. The benchmark is EXTREMELY limited in scope. Crucially, it's mostly about raw speed in a fairly artificial set of use cases. They only used one size of record for for christ's sake.
I'd say more, but I'll hold off till the final report.
When are we going to be finished with raw speed tests? I get it that faster is better. But I've always been a believer that good design will get you farther than choosing one software/hardware/infrastructure over another.
Also, I get the comparison of MongoDB and Couchbase. But why Cassandra from DataStax? It's a completely different technology with entirely different strengths and weaknesses. There are certainly overlapping use cases, but it seems like an odd comparison.
The better comparisons would seem to be comparing Couchbase to vanilla CouchDB and Cloudant's version of CouchDB. I guess you could throw in MongoDB, but MongoDB and CouchDB have different strengths and weaknesses. Raw speed alone doesn't exactly leave any technology in the rearview mirror.
Benchmarks are almost always not able to really provide a generally useful picture. When you see the actual database performance difference, is in your company, fighting for latency, in a given, specific use case, with a given writes durability and safety requirement. Every developer that really tried hard to optimize an application latency or performance knows how you end hitting the details, and very specific and database-dependent tradeoffs. TL;DR: pick databases after doing tests and simulations for your specific use case.
You actually have to use it though. TokenAware is one of many policies most of which can be layered on top of one another. Just depends on how they had the driver configured.
At my previous company we experimented with a 12 node cluster (128GB per node) used as a datastore. At one point we had over 1 billion keys. This was back in the 1.x version where they persisted data into sqlite files so each file would have 100+ million rows in it. Persisting data took forever. Rebalancing took forever.
When it worked, it was very fast. When you lost a node, things went bad. The java client would lose it's mind trying to cope with an outage. We ended up writing a connection pool were we could just recreate our connections when we detected a node went out.
That said, it was the best distributed NoSQL solution we tried and might have improved a lot since I last used it.
(That was a response to a different benchmark, also done by Thumbtack, but it looks like it applies just as well to this one. See my comment at https://news.ycombinator.com/item?id=7944226.)
Just wanted to point out that this "we are writing to memory only" thing in Couchbase doesn't apply to CouchDB. CouchDB will fsync the data to the disk by default.
You can then disable that if you don't care but it. So don't confuse Couchbase's policy with CouchDB (Couchdbase it seems has gone the way of Mongo here).
Also couchdb writes in append only mode and will let you crash restart your server without corrupting your data.
i don't understand - Couchbase took 1ms per operation without durability configured, ie. in memory. What took so long? Or is network roundtrip included?
>However, MongoDB implements a single lock per database (link). MongoDB nodes are limited to executing a single write at a time per database.
that sounds like a nightmare. Are they for real?
Looking at the results - durable configs, MongoDB and DataStax, have 3ms latency with 25K and 75K ops/sec. Are they using HDDs or SSDs? As a reference point - 5 years ago I was hitting 3.5ms on 15K disks on Oracle, at 20-30K ops/second (durable).
- Completely opaque administration. The company charges for support ($5k per production node per year!), so perhaps unsurprisingly the log output is unreadable (similar to a binary stacktrace). Forget about diagnosing cluster issues on your own.
- Bizarre failover behavior. In every test scenario I've tried, if any single node goes down my application becomes unresponsive (even if it can see other nodes that are up). This could be a problem with the .NET driver, but it's hard to tell.
- Weird performance. Usually Couchbase is fast, but sometimes my application will go into what looks like a busywait loop and consume 100% CPU while trying to contact the Couchbase cluster. Again, possibly (probably?) a driver issue, but it makes me hate dealing with Couchbase.
- Ten bucket limit. Ostensibly for performance reasons, but cannot be over-ridden. Even if I'm running a non-production cluster where performance isn't critical.
At this point we're looking at switching to Redis. At least it's widely used and easier to administer.