> Probably the strongest arguments for B-trees over hash indexes pertain to multi-field indexes and to nonuniform distributions of key values. A hash index on multiple fields requires search keys for all those fields such that a hash value can be calculated. A B-tree index, on the other hand, can efficiently support exact-match queries for a prefix of the index key, i.e., any number of leading index fields
Yes, well, if none of the columns/fields in the index are sufficiently selective, then the ability to search a B-tree by prefix may not do you much good. Don't get me wrong though: I fully agree with all of the given reasons for B-tree superiority to hash tables.
Although the newest storage technologies (particularly very fast byte-addressable NVMs) -- too new for TFA -- might well change things. When you have fast byte-addressable storage it will pay to not read pages, and hash tables may yet involve fewer accesses than B-trees in that case, though I suspect B-trees will successfully adapt anyways.
One thing that doesn't make the two interchangeable is total ordering and range queries, which are only supported by B-trees and not by hash indexes. So the question is whether you're willing to trade that away for faster insertion and lookup.
> When you have fast byte-addressable storage it will pay to not read pages
It always pays to not read pages :-) An alternative way to look at it is that faster storage is more forgiving of more accesses, meaning the benefits of hash indexes might be less prominent.
When you have fast byte addressable storage you can just go to plain old binary trees. But b-trees still have some advantage in the amount of storage needed.
A lot of folks talk about byte addressability as if it's going to be some massive advantage. It's not, simply for the reason that it's inefficient to store byte-level addresses for everything.
(And of course, you're really only getting cache-line atomicity at the hardware layer. Ignoring this fact, and just using arbitrary byte boundaries for accesses, will kill a lot of perf gain of NVMs.)
Well, it's an interesting survey. I'm surprised at how much space it devoted to illustrating B-trees in the context of RDBMSs as a lot of the concepts there really had no specific connection to B-trees.
So much space devoted to logging, locking and latches. All heavily used in the industry up till the last decade. All obsolete now. Not a single mention of immutable/persistent B-trees. Were there really no academic papers on the topic written in a time covered by this survey?
This is from 2011, in case you missed that. I was confused by sentences like:
>Flash memory, flash devices, and other solid state storage technology are about to change the memory hierarchy in computer systems in general and in data management in particular.
https://sigmod.org/sigmod-awards/people/goetz-graefe-2017-si...
He also has a lot to say about locking in B-trees:
http://15721.courses.cs.cmu.edu/spring2016/papers/a16-graefe...