Hacker News new | past | comments | ask | show | jobs | submit login
General Balanced Trees (1999) [pdf] (uu.se)
55 points by brudgers on Aug 18, 2019 | hide | past | favorite | 11 comments



Nice paper.

It is interesting to note how practical data structures have evolved over the last 20 years, since this paper was written. Balanced trees were pervasive back then and the primary workhorse in all database systems I designed or worked with. Today they are rarely used in new system designs, since the practical tradeoffs started making less and less sense over a decade ago. Balanced trees have a very high space overhead that becomes prohibitively costly as data sizes grow.

Succinct implicit tree structures are unbalanced, so in theory have higher latency variance, but they also can have a space overhead 2-3 orders of magnitude smaller. As a practical consequence, the cost of traversing a level in the tree is dramatically lower than for balanced trees. The number of levels traversed tends to be used in literature as a shorthand metric for "optimality" but it smuggles in the assumption that traversal costs are similar across algorithms in practice. Searching a nicely balanced O(logn) gigabyte metadata structure has a hard time competing with an equivalent unbalanced tree that fits the metadata entirely in L2 cache, even when the tree is very unbalanced (and the average case is much better).

I am not entirely sure when I stopped using balanced trees. I went from always using balanced trees to never using balanced trees, gradually over many years.


Can you provide some references to open source implementations of Succinct implicit trees ? How different are they from a simple unbalanced BST ?


For anybody interested in these, Erlang has an implementation of both sets and trees apparently in its standard library. The documentation claims that they perform better than AVL trees.

trees: http://erlang.org/doc/man/gb_trees.html

sets: http://erlang.org/doc/man/gb_sets.html

source for trees: https://github.com/erlang/otp/blob/master/lib/stdlib/src/gb_...


I came across the paper in the Erlang documentation.


Erlang's ETS also has ordered_set tables which is a C based set, most likely better for performance, although certainly not as transparent.


These trees nearly identical to Scapegoat Trees, which were a rediscovery of the idea by different authors. https://cglab.ca/~morin/teaching/5408/refs/a99.pdf has the history:

"A preliminary version of this article is published in Proceedings of Workshop on Algorithms and Data Structures, WADS ’89, Ottawa. An independent article has also been presented by Galperin and Rivest at the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA."


I found it difficult to understand this paper. I understand the idea of balanced binary trees, but the paper gives a pile of inequalities and no clear actionable advice about how to construct efficient algorithms.

For self-balancing trees, my technique of choice is AVL trees. The implementation is short and the mathematical proof is straightforward. It is only a small constant factor worse than a completely balanced binary tree.

I searched for even simpler structures and found AA trees. I showed that their code is only a tiny bit shorter than AVL trees, but proving their correctness is noticeably more difficult. https://www.nayuki.io/page/aa-tree-set (see especially "Compared to AVL trees", BasicAaTreeSet.java, BasicAvlTreeSet.java).

In the real software applications though, B-trees are preferred because they play well with the cache/memory/disk hierarchy - the fact that sequential access is fast and random seeks are slow.


Log-height is a sufficient but not necessary condition. There are a few other ways generalize balanced trees, notably logit-weight.


But logit-weight trees have logarithmic height, yes?


Weight is a weaker condition, since you can construct a polynomial weight sequence that results in linear height.

In general, height is the easiest thing to restrict, but doing so restricts dynamic performance optimizations - you can't use splay trees, for instance


> Weight is a weaker condition, since you can construct a polynomial weight sequence that results in linear height.

I'd like to hear more. The varieties of weight-balanced trees I'm aware of all have logarithmic height.

> In general, height is the easiest thing to restrict, but doing so restricts dynamic performance optimizations - you can't use splay trees, for instance

Good point.

BTW, you might be interested in Bose et al.'s "De-amortizing Binary Search Trees" <https://arxiv.org/pdf/1111.1665.pdf>, shows how to keep height logarithmic with "essentially any Binary Search Tree" (their phrase).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: