Hacker News new | past | comments | ask | show | jobs | submit login
Exploring the design space of binary search trees (rtheunissen.github.io)
122 points by rtheunissen on Aug 15, 2023 | hide | past | favorite | 28 comments
I started this project many years ago when I was coming up with ideas for immutable data structures for the PHP data structures extension [1]. I wanted to support access by position in a sorted set, which led to the idea of using binary search trees for both lists and sets. However, I did not expect the scope of this project to increase as much as it did.

The more I read about binary search trees, the more I thought about them, and so down the rabbit hole I went. When you read everything you can find about a topic for several years, eventually you develop a deep enough understanding to compose new ideas from different sources.

This project is the result of many rejected ideas and countless experiments. I tried my best to distill everything I consider important enough to keep, and will continue to develop other ideas as they come along. I had no intention to implement anything other than weight-balanced strategies to support positional access, but when I read about rank-balanced trees I knew I had to take on the challenge to implement them.

I've been in contact with various authors along the way, specifically Bob Tarjan and Salvador Roura, but have otherwise not received any feedback yet. Implementing all the algorithms was incredibly hard work, by far the hardest work I've ever done, so I hope that others may find value in their presentation.

There is still so much work that could be done, but there comes a time when working on something alone begins to yield diminishing returns. I hope to continue this project as an open-source collaborative effort, so please feel free to ask questions or suggest changes, however small.

[1] https://github.com/php-ds/ext-ds




This looks like a thorough treatment of the subject. But my intuition is that B-trees are better than BSTs in almost every context. A good B-tree implementation is more cache-friendly, so faster to insert, delete, traverse. In my previous job I used to replace STL map uses with a btree implementation. It was always faster.


That is true, but in a persistent setting they likely also copy more data, and iterator invalidation might be a concern in some cases when moving values around within a B-tree node. The motivation was not really to come up with the best tree structure or to even consider memory hierarchy at all.

Instead, within just the scope of binary search trees specifically, what cool things might we uncover by taking a closer look? In a hypothetical future, our hardware might be so different and caches so large and advanced that the best practices of today might not apply all the same.

Binary search trees today are probably only viable for absolutely massive collections, and at that scale there is almost definitely a better, bespoke solution available. Exploring without necessarily thinking about the hardware, in the abstract, has been a fun sandbox to play in. The fact that we still teach them suggests that they provide value beyond practical application.

Thank you for your feedback, you are absolutely right. In most practical use-cases today, a B-tree would likely achieve better results.


> caches so large and advanced that the best practices of today might not apply all the same.

All trends are going in the wrong direction for binary trees to ever become useful again, and some of those limits are physics, not design.

- Branch predictors hate binary trees.

- CPU pipelines hate pointer chasing.

- Larger caches work less efficiently with binary trees than alternatives, such as "cache-oblivious" algorithms.

- Operations wasted per cache-miss due to memory latency has been going up, not down.

Etc...

> even consider memory hierarchy at all.

There's an often overlooked physics aspect here: physical memory requires physical space. As you add "N" bytes of memory, the layout of the circuits needs to expand into space. If deployed in a flat plane as is done typically, then this is a shape with radius roughly proportional to √N. For a space-filling volumetric design, at best it is ∛N.

Access to a piece of data requires a round-trip that takes time proportional to the distance taken through the computer. This is a simple consequence of the speed of light.

This means that unpredictable or random memory access time scales as ∛N to √N, not the constant "1" as typically assumed in big O analysis!

This is inescapable for computers made from physical components instead of pencil & paper abstractions.

If you add in this additional term, a lot of algorithms traditionally considered to be efficient suddenly look much worse. Conversely, algorithms preferred by developers with a knack for performance tuning don't get such a big penalty.


What if some future technology or material breakthrough provides a sort of self-adjusting liquid memory that provides true constant time access to any address? I'm not being entirely serious of course, as I dream about sequences across nodes on planets through other solar systems.

Focusing on fundamental algorithms in the abstract provides a fun playground to explore and learn and teach, before you learn about memory hierarchy when all your hopes and dreams of the ideal data structure fades away.

I don't think there is any time wasted exploring the fundamental. Who knows what technology might see renaissance in the future as hardware continues to change. Analog computers, binary search trees, who knows.

It's fun to dream and take a break from reality sometimes, digging deep into a simple concept with a rich design space and complex analysis.


I would not disregard that direction. Architectures that do well on pointer chasing have been attempted in the past (e.g. Cray XMT) and are cropping up in prototypes (PIUMA https://arxiv.org/abs/2010.06277) and startups. The caveat is that latency is hidden with massive amounts of parallelism.

edit: To be clear, I agree with the sentiment that the paper abstract machine models are inadequate in most practical cases. There are workloads where you're dealing with asymptotically large datasets, such as genome assembly and analysis. Even there, the RAM model is abstracting away the bottlenecks of modern computer systems.


Interesting idea. I'm not confident it holds, access time to memory is often described as some count of cycles. E.g. 3 cycles to L1, some number of hundreds to somewhere else in the hierarchy.

Memory also positioned at some distance from the CPU (or whatever silicon is doing the arithmetic), where copying from one place to another involves copying into and then back out of the CPU.

More memory is slower, but within a given level of the cache hierarchy, I'd guess access time to any memory to be constant. How much variation is there in latency as a function of physical address, within say system level ddr4?


The variation is not smooth in real systems, but just like you’ve noticed: it’s right there in the L1->L2->L3->RAM->Disk hierarchy.

Each one is physically bigger, further away, and higher latency.

We might one day have 1 PB memory systems with 1 TB of on-chip cache… but the larger memory will still need more space and be further away…


A lot of cache oblivious structures start with a BST laid out in a flat array, which is actually more cache friendly than a naive B tree. But not many people do that in practice, and also require keeping it balanced, so B trees are fine.

BSTs are also useful as an analytical tool because they're just B trees with B = 2, so a lot of the insight into their structure and algorithms can be extrapolated to N-ary search trees aka B trees


by 'bst laid out in a flat array' i am guessing you mean a layout like a traditional binheap, so, for the keys 1 2 3 4 5 6 7

    4 2 6 1 3 5 7
is that right, and do you mean to make this arbitrarily large

i feel like this is less cache friendly than a naive b-tree, even without rebalancing; if your cache line size is 128 bytes, your keys are 4 bytes, and your b-tree nodes are 15 keys and 16 (4-byte!) pointers, you can reach any of 1048576 keys in 5 cache-line fills, of which probably 2 were already in your cache so it's really 3. by contrast, the flat-array binary search tree above is 20 levels deep. the first 5 levels are in one cache line (because you don't waste any space on pointers) but every key in the following 15 levels of the tree is in its own cache line so you have probably 14 cache-line fills

14 is like a lot more than 3 in my mind maybe

conceivably you have done this in practice and can tell me what i'm overlooking


I believe yes, as illustrated in [1]. This idea only works in a static sense, because to insert a value suffers from the same linear movement of memory as dynamic arrays. B-trees are somewhere in-between because they support logarithmic insert/split/join and make better use of cache than BSTs, a well known fact.

I tried to focus specifically on BSTs in the context of online persistence and concurrency, where they are particularly effective at very large sizes. There is a section on the paper that mentions B-trees but I did not want to include a direct comparison as part of the scope. Take the best candidate from this set, and compare against a good B-tree implementation in whatever situation you might require them.

[1] https://algorithmica.org/en/eytzinger


No one else has mentioned it, so I'll be the first. I love the typography and layout. Looks like a proper book. I'll definitely be coming back to this for future reference for style and layout


I was inspired by Stick & Rudder, which is a very easy book to recommend to most people in this community. I'm sure I'll keep coming back to make adjustments, but the end result is what I hoped to achieve.

The Charter font was a particularly good find as part of the "transitional" system font stack. I'll definitely use it for other projects in the future.


Really one of the most beautiful pages on the web. A prime example of what content on the web could look like. No bloat, fast, beautiful typography, SVG graphics not unnecessary jpegs/pngs.

You have my respect for making this and setting an example which I will definitely strive towards moving forward for my own stuff.


If you like this article then you might also be interested in: https://github.com/c-blake/bst

It's ANSI C with a bespoke macro generics system not Go, does not cover as many balancing rules, and is not written up as nicely. It does do something only weakly represented in your Go impls, AFAICT - "binary symmetry" - the reflection about left-right intrinsic to most ideas in this area. (Mehlhorn has a book that does this, but that is the only other source I know.) It also has API considerations like seek-edit separation and how to handle in-tree duplicate keys (FIFO/LIFO/etc. as opposed to values which are also collections).

Also, Re: B-trees among several comments here - edit heavy B-trees with very large nodes (driven by IO bandwidth-delay products) need some "mini-scalable" structure for their nodes since shifting (on average) half the entries can cost. That mini-scale could be another B-tree or it could be a binary search tree, perhaps adjusted to have 2-byte sized pointers into the little arena that is a node if 64Knode is enough. I once heard Sybase (now Microsoft SQL Server?) used skip lists. Anyway, this may be a remaining use case for binary trees even in the presence of a B-tree.


Thank you for sharing this resource, I was not aware of it. I am happy to see the inclusion of LBSTs there too.

Re: binary symmetry, if I'm understanding correctly, another author that makes use of the symmetry is Ben Pfaff in libavl [1]. At the top of [2], which seems a bit misplaced now, I wrote:

>A choice was made to not unify the symmetric cases using the direction-based technique of Ben Pfaff and others because it makes the logic more difficult to follow even though there would be less code overall.

The choice of Go was to provide implementations that are both reliable to benchmark (though not as robust as C or Rust for example) but also easy to read. I would like to further reduce abstraction by decomposing common parts such that all the strategies are "one-file" references. This is then effectively the opposite of what the macro-based implementation achieves. Both have value, of course.

[1] https://adtinfo.org/libavl.html/BST-Node-Structure.html

[2] https://github.com/rtheunissen/bst/blob/main/trees/avl_botto...


LBSTs are pretty rare, too. :) Re: symmetry reasoning troubles, another benefit is that it enforces the symmetry algebraically rather than relying on code edits maintaining it (& multiplying it). As to DRY vs. reducing abstraction aka explicitness, I favor the former but, yeah, people are indeed passionate in both directions. E.g., see Java. :-)

Along that abstraction line, it perhaps bears emphasizing that it is not all subjective. E.g., being able to have 1-byte or 2-byte pointers can really shrink the per node space overhead from 16B to 2..4B, possibly total node size from 20B to 8B for a 4B payload (like a float32) or 2.5x overall space saving, an objective and interesting amount that might well keep a working set resident in faster memory. Indeed, even allowing any number of bits like 20 bits*2 is thinkable. Of course, that limits the number of nodes to the address space of the numbers, but that can be ok (such as inside 1 B-tree node or when populations are elsewise easily bounded). But then you pretty much must abstract "allocation/dereferencing" as done in that generic C package or analogously. (Others elsethread were discussing memory overheads vs. B-trees, but not at this API level and more related to tree/indirection depth in uncached worst cases.)

Anyway, I just thought that package might provide color along some other less traveled roads in this space..was not trying to direct or redirect your own effort. It's a nice write up of what you have so far.


That was how I received your feedback. :)

My inclination towards lower abstraction in this project is entirely for the sake of reading and reference, to minimize the need for the reader to re-compose from various components split across files.

During development, abstraction helps because it makes prototyping faster and more consistent, but once everything is mostly "done", it can help the reader/student to maximize local reasoning.

Another comment mentioned they found it difficult to find the LBST implementation - this is exactly the sort of experience I hope to avoid.


Some feedback on readability: the article utilizes the term "logarithmic weight-balance" 6 times without explaining what it is (and several more times after). On the 7th mention, it links to a paper: "Frias[40] published top-down logarithmic weight-balance trees in 2005, under advice from Roura." But the referenced paper doesn't mention logarithmic weight-balance even once, I think [40] calls them "Logarithmic binary search trees", and links to "Roura S., 2001, A new method for balancing binary search trees".

After reading the last paragraph:

"Any current red-black tree implementation in practice can be replaced by a logarithmic weight-balance tree to achieve better balance, better performance, and get positional access as a bonus, all with a simpler algorithm."

... it is clear the article is advocating for these, and the title should hence be something like "Advantages of logarithmic weight-balanced Trees over other balanced trees", or something like that, immediately followed by an explanation of what these are, and what other people call them.

The repository linked at the top includes a lot of stuff and makes it hard to find a logarithmic weight-balance tree implementation, lost in all other kinds of trees in there. Would be helpful to see a separated logarithmic weight-balanced tree go module for people looking forward studying its source code.

40: L. Frias, Extending STL maps using LBSTs, 2005.


Part 2 defines that "a node is logarithmically weight-balanced if the binary log of the weights of its subtrees differ by no more than 1" and references Roura directly there. Roura uses the acronym LBST, which corresponds to the file trees/lbst.go in the repository. I hoped that this would be intuitive enough to follow.

They are definitely part of the class of weight-balanced trees, using the exact same algorithms as BB[a] trees, which are the classic weight-balanced trees.

When I started this project, it was unclear that the conclusion would advocate for them. In fact, I did not even know about them when I started. My intention was to focus on the exploration more than the conclusion. I'm in the process of writing another conclusion around relaxed balance as a concept, which could just as well be the title.

I appreciate this feedback very much. Perhaps the paper could mention specifically that Roura and Frias call the logarithmic binary search trees, abbreviated as LBST. The repository's README could also provide an index to each tree in the source.

For reference: the weight-balanced algorithms can be found in [1], [2] and [3]. The logarithmic weight-balance rules are defined in [4].

[1] https://github.com/rtheunissen/bst/blob/main/trees/wbst_bott...

[2] https://github.com/rtheunissen/bst/blob/main/trees/wbst_topd...

[3] https://github.com/rtheunissen/bst/blob/main/trees/wbst_rela...

[4] https://github.com/rtheunissen/bst/blob/main/trees/lbst.go


I'm glad it is helpful! Why not just calling them LBSTs as the papers do? Catchier than LWBT I think... (gotta think of the marketing side too) :-p

Thanks for the extra pointers!


Haha true, that's a good point. There's also WAVL and RAVL for weak AVL and relaxed AVL, but no equivalent acronyms for the red-black variants. RRB is the same as Relaxed Radix-Balanced! In my notes I found it easier to avoid acronyms, and then "logarithmic binary search tree" became ambiguous because "aren't they all logarithmic (in height and complexity)?" I found that grouping them as part of the weight-balanced class of trees is the most intuitive, and I wish the original paper did the same, but that's okay.

What I'll do for now is mention specifically that the literature refer to them as LBSTs, and I'll add an index to each tree in the repository to make them easier to navigate. Thanks again.


Empty page with noscript. And with Javascript enabled just text!


This is tragic! Supporting noscript was a primary design goal but I forgot to only enable the knuth/plass justification in print media. The math expressions I'm moving to build time now. So sorry.


This has now been fixed, thank you for reminding me.


If you'd like to get more attention from academics, you could try submitting a journal paper of the work. The ACM Computing Surveys journal may be a good venue: https://dl.acm.org/journal/csur


Skimming through, I didn't see any discussion of AA trees which I have seen discussed as similar to red black trees but simpler to implement and have been interested in exploring for learning purposes. Any reason you didn't look specifically at AA trees, or were they covered somewhere that I missed?


You are correct, they have not been covered yet. I've added a note in the "work in progress" section.

There are also LLRB trees that would be interesting to see compared within this framework. All the red-black trees are implemented as rank-balanced trees, so there is no concept of "color" exactly, but the authors do mention the left-leaning 2-3 rule and the left-leaning red-black rule -- I just haven't implemented those yet.

See Pg. 5 of https://citeseerx.ist.psu.edu/document?type=pdf&doi=52330eed...


A brilliant resource thank you! Will definitely be referencing this in our ds/algorithms courses.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: