Hacker News new | past | comments | ask | show | jobs | submit login
Modern B-Tree Techniques (2011) (psu.edu)
314 points by pointy_hat on Oct 5, 2017 | hide | past | favorite | 11 comments



For those who don't know, Goetz Graefe is sort of the patron saint of btrees. He recently won the Sigmod Codd innovations award for his contributions:

https://sigmod.org/sigmod-awards/people/goetz-graefe-2017-si...

He also has a lot to say about locking in B-trees:

http://15721.courses.cs.cmu.edu/spring2016/papers/a16-graefe...


> Probably the strongest arguments for B-trees over hash indexes pertain to multi-field indexes and to nonuniform distributions of key values. A hash index on multiple fields requires search keys for all those fields such that a hash value can be calculated. A B-tree index, on the other hand, can efficiently support exact-match queries for a prefix of the index key, i.e., any number of leading index fields

Yes, well, if none of the columns/fields in the index are sufficiently selective, then the ability to search a B-tree by prefix may not do you much good. Don't get me wrong though: I fully agree with all of the given reasons for B-tree superiority to hash tables.

Although the newest storage technologies (particularly very fast byte-addressable NVMs) -- too new for TFA -- might well change things. When you have fast byte-addressable storage it will pay to not read pages, and hash tables may yet involve fewer accesses than B-trees in that case, though I suspect B-trees will successfully adapt anyways.


On the NVM topic, a quick google shows that it's an active research area:

Persistent B+-Trees in Non-Volatile Main Memory http://www.vldb.org/pvldb/vol8/p786-chen.pdf

How to Build a Non-Volatile Memory Database Management System https://www.cs.cmu.edu/~jarulraj/papers/2017.nvm.sigmod.pdf


One thing that doesn't make the two interchangeable is total ordering and range queries, which are only supported by B-trees and not by hash indexes. So the question is whether you're willing to trade that away for faster insertion and lookup.

> When you have fast byte-addressable storage it will pay to not read pages

It always pays to not read pages :-) An alternative way to look at it is that faster storage is more forgiving of more accesses, meaning the benefits of hash indexes might be less prominent.


So the question is whether you're willing to trade that away for faster insertion and lookup.

I wouldn't characterize the decision as a trade off. They're different tools for different situations.


When you have fast byte addressable storage you can just go to plain old binary trees. But b-trees still have some advantage in the amount of storage needed.


A lot of folks talk about byte addressability as if it's going to be some massive advantage. It's not, simply for the reason that it's inefficient to store byte-level addresses for everything.

(And of course, you're really only getting cache-line atomicity at the hardware layer. Ignoring this fact, and just using arbitrary byte boundaries for accesses, will kill a lot of perf gain of NVMs.)


Well, it's an interesting survey. I'm surprised at how much space it devoted to illustrating B-trees in the context of RDBMSs as a lot of the concepts there really had no specific connection to B-trees.

So much space devoted to logging, locking and latches. All heavily used in the industry up till the last decade. All obsolete now. Not a single mention of immutable/persistent B-trees. Were there really no academic papers on the topic written in a time covered by this survey?


This is from 2011, in case you missed that. I was confused by sentences like:

>Flash memory, flash devices, and other solid state storage technology are about to change the memory hierarchy in computer systems in general and in data management in particular.

until I looked up at the publication date.


Thanks, we've updated the title!


Really good read ! Must read for people who like algorithms and competitive programming !




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: