An LSM-Tree-based Ultra-Large Key-Value Store for Small Data [pdf]

jsherer · on May 2, 2016

This looks like an interesting finding. Unfortunately, the trade-off for this type of efficient small data storage is real:

> Note that LSM-trie uses hash functions to organize its data and accordingly does not support range search.

Range search, while not directly applicable to all data sets, is an important feature of the LSM data stores compared (LevelDB & RocksDB).

The authors acknowledge this and say:

> There are techniques available to support the command by maintaining an index above these hash-based stores

So, don't plan on using an LSM-Trie for a direct replacement for your LevelDB or other LSM-Tree based projects that rely on Range searches without considering the additional complexity of building and maintaining an index to perform those Range searches.

jasonwatkinspdx · on May 2, 2016

For folks curious how you might build up a range index atop a hash store, here's a general scheme: https://www.eecs.berkeley.edu/~sylvia/papers/pht.pdf

jorangreef · on May 2, 2016

For the use-case that the paper presents, I don't think that range search is that important. Most use-cases for LSM-trees so far have been keeping in-memory metadata costs down (i.e. replacing storage engines such as Bitcask), and converting random writes into sequential writes so as to make the most of disk throughput and not wear out SSDs. LSM-trees have had shortcomings in the past with write amplification, and I think this paper does well in addressing that (besides pushing the envelope in terms of reducing in-memory metadata costs).

parenthephobia · on May 2, 2016

Of potential interest, the code itself: https://github.com/wuxb45/lsm-trie-release

eyan · on May 2, 2016

Every time I come across a new KV store, I go out looking for an userland implementation from the Acunu guys: http://arxiv.org/abs/1103.4282. So far, none seen yet.

I'd like to see Stratified B Trees kick LSMs' butts. Or the other way around. I can't code this yet. But I can hope that somebody's already on to this.

Need to mention Percona's take on this space too: https://github.com/percona/PerconaFT. The patent notice makes it scary tho.

kragen · on May 2, 2016

I haven't read the whole paper, but "LSM" is "log-structured merge"; I guess they merge runs over time to keep write amplification down? As eyan points out, Twigg's (Acunu's) "stratified B-trees" may have obsoleted the whole copy-on-write-B-tree family of data structures.

jorangreef · on May 2, 2016

This is a form of LSM tree which sacrifices range queries to gain at most 5x write amplification over 5 levels. That translates into a significant increase in throughput compared to LevelDB and RocksDB.

jorangreef · on May 2, 2016

This paper targets datasets of 1-10 billion keys, keeping write amplification down to at most 1x per level (i.e. 5x over 5 levels), and in-memory indexes to a minimum.

Here are the slides: https://www.usenix.org/sites/default/files/conference/protec...

CyberDildonics · on May 2, 2016

I tried LevelDB with high hopes, but it is ungodly slow. The claims did not line up with reality in any way.

jorangreef · on May 2, 2016

This is very different to LevelDB. But LevelDB's claims are actually valid, provided your dataset is not massive, and you understand how your workload interacts with its compaction and write amplification. This paper presents a form of LSM-tree which solves both those issues (massive dataset, low write amplification).

jkot · on May 2, 2016

Interesting, will try to add that into my project.

elcapitan · on May 2, 2016

Now that we know that "number of atoms in the universe" is not a good extremely large number [1], maybe we can use "number of key-value-stores" as the new reference?

[1] https://news.ycombinator.com/item?id=11588918