Totally... this is why pipelining makes Redis 10x faster, less syscalls.
Basically to make Redis much faster we need to work to three different related things:
1) Less kernel friction.
2) Threaded I/O, this is the part worth threading, with a global lock to execute queries so you don't get crazy with concurrency and complex data structures. Memcached did it right.
3) Pipelining: better client support for pipelining so that it's easy to tell the client in what order I need my replies, and unrelated replies can be glued together easily.
(Author here) Continuing our thread from Twitter, one of the things I wanted to say here is that things like pipelining and threading are good, but there is still improvement to be made at the kernel I/O layer. If we can reduce this latency there could be very compelling cost incentives especially at the datacenter level, and it might simplify application logic quite a bit.
This is one of the big reasons I like the paper I mention in this post.
Point "2" better applies when the storage substrate is memory and operations are O(1) or logarithmic, because in that case, the time to serve the query is comparable small compared to the time needed to process the reply and send back the response. With an on-disk storage, I would go for a classic on-disk database setup where different queries are served by different threads.
LMDB beats all other "classic" on-disk databases for read performance. It also happens to beat all other in-memory systems for read performance too, since its reads require no locks.
Basically to make Redis much faster we need to work to three different related things:
1) Less kernel friction.
2) Threaded I/O, this is the part worth threading, with a global lock to execute queries so you don't get crazy with concurrency and complex data structures. Memcached did it right.
3) Pipelining: better client support for pipelining so that it's easy to tell the client in what order I need my replies, and unrelated replies can be glued together easily.