Hacker News new | past | comments | ask | show | jobs | submit login
How to choose an in-memory NoSQL solution: Performance measuring (highscalability.com)
63 points by dsr12 on Dec 30, 2015 | hide | past | favorite | 31 comments



> Tarantool is an open-source NoSQL database management system and Lua application server developed in my.com. The first version of Tarantool was released in 2008 and the newest version is 1.6.7.

and later

> Through all tests we executed, Tarantool showed the best result for the count requests per second and for many of tests latency values on any type of examined workloads. Therefore, we can decide that for most of typical projects Tarantool suits them more that popular solutions such as Redis, CouchBase or Memcached. This is the basis of our decision to use Tarantool for our projects here at my.com.

So yeah. Obviously.


tarantool.org is an open-source project which is supported by my.com and many other commercial customers. my.com is an independent U.S. company that choose products on competitive basis.

Anyway, the benchmark is fully open. If in doubt, you can always download disk images and re-run benchmark.

// Disclaimer: http://tarantool.org/ developer


Why is memcached under-performing? Is it maybe their driver? Or their benchmark? Shouldn't it be similar to Redis, if not a bit faster?

It's a bit fishy that the leading one is by the guys who wrote the article, see https://news.ycombinator.com/item?id=10814318


Those tests don't look reliable.

For example I'm quite surprised that no one pointed it out, but if you look, the memcache its performance grows nearly linearly to number of threads. They stopped at 256 when it was about to take over redis.

Now if you look at workload A and B, workload A supposed to have 50/50 read/write while workload B was 95/5 read/write. You see that memcached performed terribly there. You would think that perhaps the access pattern is different but then if you look at rest of databases they perform closely to workload A. And memcached which was doing 15000req/s is doing 10000req/s on a workload which supposed to be less work.

This looks to me that performance issue is most likely not in memcached itself but in their test application, but we can't prove that because they did not release code they used for testing.


Underperforming might be caused by my YCSB memcached driver... I'll revise it. Memcached utilized all CPUs during the benchmark.

P.S. All VM images are open, you can repeat the benchmark if you wish.


Postgres is also a good in-memory NoSQL solution. Actually you can use anything really as an in-memory NoSQL solution such as ArangoDB, MySQL, MongoDB, RethinkDB or whatever.


I haven't tried PostgreSQL as an in-memory solution, but in my another benchmark it has reached a good performance in comparison with other on-disk solutions. Thank you! I've scheduled benchmark of PostgreSQL in RAM.


Do you mean using postgres tablespaces backed by memory or do you mean using a FDW?

If you do mean using a ramdisk tablespace, the postgres docs recommend not doing that. However, if you really want to do that, make sure you attach xlog (the WAL table) to that ramdisk tablespace as well, otherwise every transaction will still hit the disk.


Not the poster upthread. But if you have enough memory (shared_buffers) and use unlogged tables, the effect is going to be that both reads and writes pretty much only hit memory, no disk. Without going through the trouble of moving tablespaces or anything.


Putting tables in RAM and setting the config as you suggest works just fine.


Sqlite is always a good option if you want in-memory and SQL. Kyotocabinet supports a number of in-memory database types, as does its networked counterpart kyototycoon. BerkeleyDB can be effectively run in memory if you size your cache right, and as a bonus you can build bdb with a sqlite frontend.

Anyway, cool post. For more lua, there's openresty (nginx and lua), and kyototycoon supports lua scripts as well.


Apache Geode is a distributed, in-memory database (http://geode.incubator.apache.org/).

See https://news.ycombinator.com/item?id=10596859


I will try it. Thank you!


The hardware setup is not clear. Are you running YCSB client and the database on the same single A3 instance?


No, I ran two instances close to each other (in one DC).

For example, Tarantool and CouchBase were installed on nosql-1 instance and YCSB client on nosql-2 (There are links to *.vhd files in the article).


Sometimes it is depressing how little effort people put in to run a level playing field when doing numbers like this. The couchbase SDK used is from Mid 2013, with little effort the numbers could've been better..


I've used my simple Memcached module, not Couchbase SDK.


A couple of notes on usability: Having the labels of the charts (i.e., the color to db matching) be defined only in one place (above the graphs) makes comparing the charts in the article really difficult to read and interpret.


I'm quite surprised to see that there's no Aerospike in this comparison.



Similarly, I was hoping to find VoltDB in the mix.


I'll try to benchmark it soon. Thank you!


Why downvote? How is VoltDB not relevant here?


No idea (I haven't voted here at all), but maybe someone didn't want to make this a long thread naming myriad of NoSQL databases out there?


This is only a first try. I'm going to make another review of a large number of NoSQL solutions soon. Aerospike will be observed there.


Agreed. It's pretty trivial to overwhelm a Redis instance. For ad serving scenarios at high traffic scenarios Aerospike is really where it's at.


Unlikely you are going to overwhelm Redis in a non-deterministic fashion. What do you even mean by overwhelm?

The post only concerns itself with in-memory workloads; I don't think Aerospike is competition in this space, while their advantage is against workloads working against SSD backed datasets. "After Google published a blog post “Cassandra Hits One Million Writes Per Second on Google Compute Engine” – using 300 nodes, we followed the same steps and documented how Aerospike Hits One Million Writes Per Second With Just 50 Nodes On Google Compute Engine. [...] Aerospike on SSD is very comparable to that of RAM. At 100% write, the SSDs are able to sustain 226,000 transactions per second compared to 239,000 for RAM." [0] Redis would have no problem hitting 250K IOPS on just a single core provided by GCE (or AWS EC2 or similar).

[0] http://www.aerospike.com/resource/aerospike-soars-in-the-goo...


Redis needs only two nodes on GCE to deliver 1M writes per second: https://redislabs.com/blog/nosql-bar-datastax-aerospike-couc...


A single Redis instance uses a single CPU core so yeah, hitting the its limits (no matter how high) is always a possibility. OTOH, there's always the possibility for sharding and clustering a Redis database to scale it up 'n out, thus making "overwhelming" it less trivial.


Hi folks, was this a test over a single machine? Also anyone know why the test used an unofficial version of couchbase (4047)?


Yes, it is. When I was running the test, CouchBase 4.x hasn't been released.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: