Modern B-Tree Techniques (2011)

makmanalp · on Oct 5, 2017

For those who don't know, Goetz Graefe is sort of the patron saint of btrees. He recently won the Sigmod Codd innovations award for his contributions:

https://sigmod.org/sigmod-awards/people/goetz-graefe-2017-si...

He also has a lot to say about locking in B-trees:

http://15721.courses.cs.cmu.edu/spring2016/papers/a16-graefe...

cryptonector · on Oct 5, 2017

> Probably the strongest arguments for B-trees over hash indexes pertain to multi-field indexes and to nonuniform distributions of key values. A hash index on multiple fields requires search keys for all those fields such that a hash value can be calculated. A B-tree index, on the other hand, can efficiently support exact-match queries for a prefix of the index key, i.e., any number of leading index fields

Yes, well, if none of the columns/fields in the index are sufficiently selective, then the ability to search a B-tree by prefix may not do you much good. Don't get me wrong though: I fully agree with all of the given reasons for B-tree superiority to hash tables.

Although the newest storage technologies (particularly very fast byte-addressable NVMs) -- too new for TFA -- might well change things. When you have fast byte-addressable storage it will pay to not read pages, and hash tables may yet involve fewer accesses than B-trees in that case, though I suspect B-trees will successfully adapt anyways.

kristianp · on Oct 5, 2017

On the NVM topic, a quick google shows that it's an active research area:

Persistent B+-Trees in Non-Volatile Main Memory http://www.vldb.org/pvldb/vol8/p786-chen.pdf

How to Build a Non-Volatile Memory Database Management System https://www.cs.cmu.edu/~jarulraj/papers/2017.nvm.sigmod.pdf

makmanalp · on Oct 5, 2017

One thing that doesn't make the two interchangeable is total ordering and range queries, which are only supported by B-trees and not by hash indexes. So the question is whether you're willing to trade that away for faster insertion and lookup.

> When you have fast byte-addressable storage it will pay to not read pages

It always pays to not read pages :-) An alternative way to look at it is that faster storage is more forgiving of more accesses, meaning the benefits of hash indexes might be less prominent.

narag · on Oct 6, 2017

So the question is whether you're willing to trade that away for faster insertion and lookup.

I wouldn't characterize the decision as a trade off. They're different tools for different situations.

nojvek · on Oct 6, 2017

When you have fast byte addressable storage you can just go to plain old binary trees. But b-trees still have some advantage in the amount of storage needed.

hyc_symas · on Oct 6, 2017

A lot of folks talk about byte addressability as if it's going to be some massive advantage. It's not, simply for the reason that it's inefficient to store byte-level addresses for everything.

(And of course, you're really only getting cache-line atomicity at the hardware layer. Ignoring this fact, and just using arbitrary byte boundaries for accesses, will kill a lot of perf gain of NVMs.)

hyc_symas · on Oct 6, 2017

Well, it's an interesting survey. I'm surprised at how much space it devoted to illustrating B-trees in the context of RDBMSs as a lot of the concepts there really had no specific connection to B-trees.

So much space devoted to logging, locking and latches. All heavily used in the industry up till the last decade. All obsolete now. Not a single mention of immutable/persistent B-trees. Were there really no academic papers on the topic written in a time covered by this survey?

mistercow · on Oct 5, 2017

This is from 2011, in case you missed that. I was confused by sentences like:

>Flash memory, flash devices, and other solid state storage technology are about to change the memory hierarchy in computer systems in general and in data management in particular.

until I looked up at the publication date.

sctb · on Oct 5, 2017

Thanks, we've updated the title!

bootcat · on Oct 5, 2017

Really good read ! Must read for people who like algorithms and competitive programming !