Finger Trees: A Simple General-Purpose Data Structure (2006)

keith_analog · on Dec 27, 2016

A few years ago, our research group published a data structure, called Chunked Sequence, that is similar to the Finger Tree. In short, Chunked Sequence features the same asymptotic profile as does Finger Tree (neglecting persistence) and, in addition, offers strong guarantees with respect to constant factors. Very roughly speaking, Chunked Sequence is to Finger Tree what b-tree is to red-black tree.

We've implemented Chunked Sequences as a package of C++ header files and provided source code on github. To the client, Chunked Sequence looks like STL deque, but with additional methods to allow split at a specified position and concatenate, both in logarithmic time. The operations which push and pop on the ends of the Chunked Sequence are approximately as fast as those of STL deque, and in certain use patterns much faster.

http://deepsea.inria.fr/chunkedseq/

osivertsson · on Dec 28, 2016

Cool.

But please provide clear licensing information (preferably a permissive free software license). Right now I can't find anything at all about what terms you are releasing this source code under.

Therefore most people will be unable to use it, which is a shame.

keith_analog · on Dec 28, 2016

Thanks for the comment! We are using the MIT license. I've updated the source code in the git repository accordingly.

pmarreck · on Dec 28, 2016

http://choosealicense.com/

davexunit · on Dec 28, 2016

Which MIT license?

keith_analog · on Dec 28, 2016

Thanks again! We're using the Expat license.

traviswatkins · on Dec 27, 2016

Finger trees are an example of a truly elegant functional data structure that can be used for just about everything. The problem is they don't match the memory model of real hardware so for all of their elegance and theoretical performance in reality they're too slow to be useful.

grumpyprole · on Dec 27, 2016

It's true that they have a high constant factor, but I think it's too harsh to say they are not useful. One could also say that linked-lists don't match the memory model of real hardware.

dom0 · on Dec 27, 2016

> One could also say that linked-lists don't match the memory model of real hardware.

They don't, and they almost always perform (much) worse.

qwertyuiop924 · on Dec 27, 2016

True. But they're also very simple.

It's okay to lean on the hardware a bit, provided you're careful: most programmers don't have to squeeze maximal performace out of the hardware, just good enough performance. The extra code you need to make arrays work may not be worth the speed boost.

Also, if you're working in an HLL, investigate your implementation details. In Common Lisp, for example, a lot of work has been put into making LLs as fast as possible: If you create a list with QUOTE (such lists are immutable), then thanks to an optimization called CDR-coding, you'll get a data layout very similar to an array under the hood.

OTOH, it is important to understand why LL perf isn't actually what the textbooks claim, and to consider using a different structure in contexts where performance matters. Just don't go too far the other way, either: LLs aren't dead just yet.

dom0 · on Dec 27, 2016

I believe (intrusive, as they usually are) LLs are so prevalent in C applications mainly because they are the simplest-ever list structure where insertion and deletion of items is just a no-brainer to implement in a couple lines of C macros. They are also inherently self-describing (ie. you don't need to pass around anything except one pointer).

And this is totally fine in many places. However, when the list is frequently (ie. performance is relevant) used, then it's likely that a LL is not the right choice. And usually one notices that in these cases no LL is used. For example, one might use a LL for a list of drivers, while eg. directory entries will be kept in a hash-table, tree or trie.

qwertyuiop924 · on Dec 27, 2016

...sounds about right.

grumpyprole · on Dec 27, 2016

The context here is of course persistent data structures. Linked-lists permit very efficient structural sharing.

geocar · on Dec 27, 2016

So do real arrays on real hardware (pagetable trie).

Small arrays are contiguous so memcpy is fast.

For large arrays, if you create a dummy file (or uses a memfd) you can mmap a region, and then copy it by mmaping the same region with the appropriate flags. You can resize (grow) it almost as easily.

They're so much faster than linked lists, that there's no (performance) reason to use linked-lists on real hardware.

scott_s · on Dec 27, 2016

The Linux kernel uses intrusive linked lists extensively. Container based linked lists contain a copy of the data item. Intrusive linked lists force the data structures in the list to contain list pointers inside of the actual data structure, and the list operations manipulate those list-specific pointers in the data structure.

I am not sure if anyone has evaluated using alternatives, but my understanding of why has generally been memory efficiency. If you allocate a Foo, you also allocate everything you need for Foo to be on all of the lists it may appear on. I couldn't find any confirmation (with actual numbers) for this intuition, but this is what I could find:

A 2005 explanation of how intrusive lists work in the Linux kernel: "Linux Kernel Linked List Explained", https://isis.poly.edu/kulesh/stuff/src/klist/

HN submission on intrusive lists in Doom 3: https://news.ycombinator.com/item?id=8795745

And that points to technical note on the optimizations to Doom 3's BFG Edition to get it to perform well on PS3, XBox 360 and PC: http://fabiensanglard.net/doom3_documentation/DOOM-3-BFG-Tec... One of the optimizations was moving from intrusive lists.

I'm quite aware of how inappropriate linked lists are for most applications because of their poor cache locality. I thought the Linux kernel may be a case where linked lists are actually better, but I can't find any numbers or even arguments why they would be.

geocar · on Dec 27, 2016

Linux uses linked lists because it is simple to code. That linked lists are slow doesn't matter very much because there are lots of slow parts in Linux that are a better use of attention.

Doom moved to vectors (arrays) because linked lists are slow, and because there wasn't enough other slow parts that needed attention.

cjwoodall · on Dec 27, 2016

Linked-Lists often get used in low-level code and embedded systems in situations where you might need multiple statically allocated entries to be turned into a list of unknown length. There certainly are times where this comes in useful, but it is also a memory constrained space with an emphasis on determinism, where malloc and new can be bad ideas.

There is a time and a place for them, but if you need speed I agree... there are better solutions

HelloNurse · on Dec 27, 2016

An OS kernel has no business maintaining, and even less business accessing in bulk, large data structures that are performance-critical enough to worry about coherent memory access. It is also an infrastructure-deprived environment in which swapping around pointers to implement a linked list with correct locking is relatively easy to get right but allocating vectors is out of the question.

dom0 · on Dec 27, 2016

> An OS kernel has no business maintaining, and even less business accessing in bulk, large data structures that are performance-critical enough to worry about coherent memory access

File systems are usually part of the kernel.

scott_s · on Dec 27, 2016

File systems tend not to have large data structures. They are relatively modest data structures which manage bulk access to large blocks of data. A subtle distinction, but important in this context.

IgorPartola · on Dec 28, 2016

Really? Even huge volumes with loads of tiny files? Mail and Usenet servers? Build servers? What is your definition of large and modest here?

rosser · on Dec 28, 2016

The data structures under discussion here are those used to track individual files. Yes, there can be squillions of them, but as data structures, they are, in fact, rather modest.

scott_s · on Dec 27, 2016

Is this your understanding from intuition, or are you aware of kernel developers who have made this same argument?

geocar · on Dec 27, 2016

Not sure exactly what you're asking.

We can evaluate the Linux kernel developers' priorities by benchmarking[1] and assuming they aren't stupid, because if they are stupid, then their opinion doesn't matter, and if they're not and they're not making things faster, then it's because they have other priorities.

That being said, there are a few[2] notable[3] moves away from linked lists that were ostensibly for performance reasons.

[1]: Even crap benchmarks: http://www.phoronix.com/scan.php?page=article&item=linux-44-...

[2]: https://lkml.org/lkml/2016/8/1/164

[3]: https://lkml.org/lkml/2008/4/1/458

scott_s · on Dec 27, 2016

You provided a reason for why the kernel does a certain thing (easier to code; performance in those places doesn't matter). I was asking if this was your understanding based on inference (your understanding of the performance trade offs in general combined with the fact things are done a certain way) or from fact (claims made directly by kernel developers, or experiments).

DanWaterworth · on Dec 27, 2016

You can't necessarily replace intrusive linked lists with arrays. They have more operations. You can traverse so far in one list then switch to traversing in another.

geocar · on Dec 27, 2016

No, but you also don't have to do that.

One list is traversed when committing blocks, so a linked list was never necessary - just a commit-list (vector of pointers).

Another list is traversed when dequeueing the next lock, so again: a linked list isn't necessary, just a dequeue (which might not have to be serialised).

Another list is traversed when finding the next blocked reader, but again serialisation wasn't required here.

And so on.

DanWaterworth · on Dec 27, 2016

I don't know how the kernel is using intrusive lists, but that kind of traversing and switching is occasionally useful.

geocar · on Dec 27, 2016

Yes. It is occasionally useful.

However when you have the right data structure, you'll find you won't need it.

Benchmarking is important because searching for the right data structure is time consuming (expensive for the programmer) and it's usually not necessary.

DanWaterworth · on Dec 27, 2016

When you need that kind of behaviour, you can't replace it with something flat (without doing binary searches or similiar at each crossing). Sometimes intrusive lists are the right thing.

I built something once that used intrusive skiplists because it needed to expire elements using one ordering and search them by another. It would have been much less efficient if I'd have broken it up into multiple flat representations.

(Actually, it was flat, but explaining that aspect of it is quite difficult).

zbuf · on Dec 27, 2016

> memory efficiency. If you allocate a Foo, you also allocate everything you need for Foo to be on all of the lists it may appear on.

I can't speak for the kernel developers, but for the same line of reasoning it may be more significant that there is the lack of error paths.

When you allocate an 'object', further changes of state (eg. added or removal to other lists) in its lifetime can be done without the possibility of running out of memory or address space. This can be a huge benefit to overall program structure in certain types of application.

In contrast, addition or removal to a vector or flat list can ultimately require various actions to happen.

grumpyprole · on Dec 27, 2016

My example would probably be the work of Phil Bagwell. IMHO the biggest issue with Linked-lists is they do not support parallelism, they are inherently sequential.

geocar · on Dec 27, 2016

Give up sequential! :)

If you want queues, and you can give up sequential, then you can get huge gains[1] in performance. Specific use cases will have better specific solutions.

[1]: e.g. https://github.com/cameron314/concurrentqueue

hedora · on Dec 27, 2016

You can reduce some linked list operations (including multi-element ones) to a single CAS, which is often good enough.

mamcx · on Dec 27, 2016

Where I can find info about pagetable trie?

geocar · on Dec 28, 2016

The page table on x86/amd64 is a trie. OSDEV[1] has a pretty good page on it.

All you need to remember is that this is what is inside the operating system and hardware layers, so you're already paying for it. What I propose is taking advantage of it. If you can arrange things correctly (see my other comment describing this more fully[2]) you might find things that seem expensive (when dealing only with von neumann memory) are suddenly very cheap when you accept you're programming a piece of real hardware.

[1]: http://wiki.osdev.org/Page_Tables

[2]: https://news.ycombinator.com/item?id=13269288

zerofan · on Dec 28, 2016

I'm interested too. I've been thinking about the idea, and it seems like you'd have to be sure to re-use the hell out of your file descriptor on /dev/zero (otherwise, you could easily run out of file descriptors). It also seems like you're trading cache misses for system calls if you have to re-map pages a lot. Maybe it's a clear win, but I'd like to understand it better.

geocar · on Dec 28, 2016

No need for /dev/zero: Linux has memfd[1] and OSX has vm_remap[2]. You only need one file descriptor per heap because Linux lets you poke holes with fallocate[3].

I'll define objects with a header that looks roughly like this:

    struct header {
      off_t phys;
      size_t len;
      int localrefs;
      short flags;
      char mstrategy, type;
    };

phys is the offset in the heap file descriptor. len is the number of units and type is indexed into an array of unit sizes.

mstrategy is used to select the bucket size (allocated range is a power of two, so 1<<(mstrategy&31)) and the heap number.

localrefs is an optimisation which I'll get to.

If I want to allocate an array of type t, size n, I can use a BSR[4] to identify the bucket that it needs, and see if there's anything on the free list. If there isn't, I can see if I can split a larger bucket into two parts (this is effectively Knuth's buddy allocator).

I know that (mstrategy&31) < BSR(page size) needs to be moved just like the classical allocator for appending or prepending, but when (mstrategy&31) >= BSR(page size) I can take a virtual address of a bigger region, then either mmap the memfd (using phys) or vm_remap the region into the bigger region. Instead of copying the contents of the pages, the operating system will simply copy the page table (which is 3 levels deep[5], hence the log log log, although using 1G pages means log log with a lower coefficient). This is a tremendous win for problems that need to deal with several large arrays.

Now localrefs gives me a further optimisation: In-process, I can track the reference count of objects, and if my functions always consume their arguments I know inside the grow/append/prepend routine if this is the only holder of the object. If it is, I can potentially reuse this virtual address immediately, saving 3-4 syscalls.

When it's time to deallocate, I can put small objects on my free list, and garbage collect any big objects by calling fallocate() on the physical address to poke a hole (freeing system memory). OSX doesn't need fallocate() because mach has vm_unmap.

[1]: https://dvdhrm.wordpress.com/2014/06/10/memfd_create2/

[2]: http://web.mit.edu/darwin/src/modules/xnu/osfmk/man/vm_remap... because the osx manual page is pants

[3]: http://man7.org/linux/man-pages/man2/fallocate.2.html

[4]: http://x86.renejeschke.de/html/file_module_x86_id_20.html

[5]: http://wiki.osdev.org/Page_Tables#Long_mode_.2864-bit.29_pag...

> It also seems like you're trading cache misses for system calls if you have to re-map pages a lot

Cache misses aren't the dominant force here.

The real trade off is illustrated with benchmarking: memcpy one page, versus 8 bytes (page table entry). How many pages do you need to copy before it is faster to pay the fixed (<100ns) cost of the system call and just copy the page table entries?

Memory streams at a rate of around 10GB/sec, but the TLB flush is ~100ns and the memory latency is only around 10ns, so it's easy to see how quick the gains add up when you're using 1GB pages.

chii · on Dec 27, 2016

except when you want fast prepending.

dom0 · on Dec 27, 2016

There are many strategies to deal with that efficiently. Eg. negative indices (having memory before index 0), append-and-reverse.

You can also map memory before an existing mapping, but it's a bit system-and-platform dependent. (But an entirely feasible thing to do).

Naively you can also grow-and-shift or copy ... the first one is actually incredibly fast due to perfect locality, and also very simple to implement. It's still, technically, O(n^2) when building a list, so it's usually a better idea to go for append-and-reverse (which works outside the actual list implementation). Normally that's not a problem at all, and often no reversal has to be materialized.

geocar · on Dec 27, 2016

If you know the format you want, and read infrequently, just implement it as an append and read it backwards.

If you have frequent reads, you can also consider writing your data to a file (or a memfd), and then using mmap to construct the memory layout you want. You will then see that prepend is worse-case log^3 (depth of the pagetable, assuming you have to keep moving the whole thing) time, which is much better than finger trees, and if you prepend often you can make it constant time (by choosing a high start virtual address: remember, there's 64 bits of address but only 48 bits of memory, so you can have 64k big objects without much work) which makes it faster than linked lists.

Another idea: use VMX extensions and collaborate between each layer to trade a lower constant mean and coefficient to parameterise max to log^k. Gains here might require very big data sets though.

gpderetta · on Dec 27, 2016

deques support fast prepending. You can implement them on top of shared memory, although as the canonical implementation is block based and uses indirection for access, the dynamic resizing capability is less useful.

mojomark · on Dec 27, 2016

>they almost always perform (much) worse.

'Much worse' relative to what exactly?

geocar · on Dec 27, 2016

Than real arrays, which are backed by a pagetable trie.

catnaroek · on Dec 27, 2016

Even ignoring real hardware, finger trees are only efficient in an amortized sense where more specialized purely functional data structures offer better real-time guarantees.

(0) Asymptotically optimal heaps: Worst-case O(1) insert, merge and findMin. Worst-case O(log n) delete.

(1) Asymptotically optimal search trees: Worst-case O(log n) lookup, insert, update, delete.

(2) Asymptotically optimal deques: Worst-case O(1) cons, head, tail, snoc, init, last.

etc.

Another downside is that they require somewhat fancy language features (namely, polymorphic recursion) to implement in a type-safe way.

optionalparens · on Dec 27, 2016

Firstly, let me say you're most likely correct. I do find finger trees useful in functional languages as well and for creating immutable structures with them.

I want to add some general comments about anyone thinking about using a finger tree or anything else.

Give any potential data structure a try with real data, not contrived stuff and benchmark on the actual target hardware and in a real code path if possible. Further, try it with real-world access patterns like adding, removing, ordering, etc. at different times at different sizes - the results can be interesting. Even if something does well for one use-case, the other use-cases you need it for it may perform awful, so sometimes it's a compromise. Data structure performance is quite often not performance for a single operation, but a function of several operations. This makes it even more important to test it with real data when possible and access it different ways.

Moreover, too many times I see positive or negative benchmarks where someone is filling a data structure with exactly the same integer, or the same instance of an object or something like that. In practice, the underlying data can introduce other considerations based on the size, pointer chasing, memory allocations (especially if the data structure can dynamically resize) and so on in just about every language. Factor in hardware, and the results can change even further. Also ensure your benchmarking itself isn't flawed - we've been there and discussed it ad-nauseam and yet I continue to see bad benchmarking in the wild (particularly on the JVM or CLR).

As a related example, I've found that even the famed link list can suck in real-world. I've always been a "choose the right data structure for the job" kind of guy, and yet sometimes it's more deceptive than it seems. You might say, oh, my access patterns are exactly a linked list, and yet a simple array with lots of helper cruft code will smoke it in practice. Likewise, I've seen shocking results replacing something with a skip list or other data structure that seems like it should perform worse, but depending on the data, size, hardware, and implementation, I get interesting results. I've seen plenty of stupid things like someone who picked a hashtable for fast lookups, only to iterate through it in some other code path for some other use-case accessing the same data, completely murdering overall application performance due to selecting based on a single criterion. If you're using a language where you can create your own allocators, it can become even more interesting than you think - i.e. there are gains or losses to be had.

The moral is simple: "just try it." This comes up a lot in game programming for example and I've seen it in embedding programming have huge effects as well. I can't even begin to explain how much faster I've made things just by changing a data structure because of real-world conditions vs. lab conditions. Don't just assume that picking what the textbook says in the Big O table is what actually will dictate performance, positive or negative. There's a reason most good data structure books also contain explanation regarding behavior and performance, though even books don't and can't explain the real-world nuances when working with non-contrived data.

wyager · on Dec 27, 2016

Sure, if you need to do hundreds of millions of insertions per second. In the vast majority of real-world use cases, most of your time is going to be spent elsewhere. Spending a microsecond waiting on memory lookups for a cold finger tree isn't going to do much.

Drop to using less convenient array-backed structures in the few places you need really fast operations, and use slightly slower (constant-wise) data structures where it's more convenient.

gpderetta · on Dec 27, 2016

> Spending a microsecond waiting on memory

that's an eternity!

wyager · on Dec 28, 2016

Not if you're writing UI code, networking code, most IO-related code, etc. A microsecond is very small compared to the time penalty you are likely to incur doing any sort of syscall. For the vast majority of code, it's negligible.

For tight, fast, repetitive code it is far too slow.

jstimpfle · on Dec 27, 2016

Is this true? How is this more true for finger trees than for red-black trees, etc?

rbehrends · on Dec 27, 2016

As a rule, pointer-based data structures tend to be slower than array-based data structures (though, as with any general rule, there are exceptions). This is a simple consequence of how caching and prefetching works in modern CPUs. And this performance difference can easily amount to several (!) orders of magnitude.

However, there are a few caveats:

1. A garbage collector with a bump allocator can reduce these overheads significantly. So can a custom implementation (e.g. emulating pointers as offsets within an array).

2. The overhead often does not matter for performance due to the 90/10 law or because the application is not performance-critical to begin with.

3. Pointer-based data structures may support a richer set of operations than array-based data structures (or have significantly more efficient implementations for some operations).

4. Pointer-based data structures can have better worst-case or per-operation costs where array-based data structures rely on amortized or average performance to pull ahead. This can matter in soft real-time situations or when dealing with user-supplied data.

5. Certain types of data structures do not have good array-based implementations to begin with and some algorithms demand pointer-based implementations. (Think of spatial queries, for example.)

6. Performance cost is not always dominated by operations on the container, but can also be dominated by cost of operations on the items within the container. For example, there are algorithms where the most expensive operation (by far) is comparison between elements (e.g. for a priority queue implementing an expensive heuristic); in this case, you want to minimize the number of comparisons more than the cache behavior of the data structure itself (note that this may still result in you picking the array-based implementation).

7. Pointer-based data structures allow for the sharing of substructures, potentially reducing the memory footprint of an application. Binary decision diagrams are a popular example.

8. Some array-based data structures require a much larger working set than their corresponding pointer-based implementations and take a performance nosedive when the size of the working set exceeds the size of the cache. Especially when you have several such data structures in use at once. (This is where relying on microbenchmarks for performance comparison can be dangerous.)

stuckagain · on Dec 27, 2016

Red-black trees are pretty much terrible for performance as well. I spend most of my work days trying to eradicate std::map and std::set for performance reasons.

Academics are always concerned about bounds, but practitioners are always concerned about cache misses.

optionalparens · on Dec 27, 2016

I generally agree with this and most of the time it's hard to justify using a red-black tree if you care about performance. You can use one if you have other concerns perhaps or performance isn't one of them.

I also concur cache misses are vital and sometimes don't get enough attention since people pass hand wave them away as micro-benchmarking which isn't true often for critical code paths. I'll add that again, this is why testing on your target hardware is important. You may think you know, but often you don't. I've seen too much stupid stuff where someone wrote code that performs great on their development desktop and awful on a server or a game console because of different CPU architecture and cache design or sizes.

Maybe someone can dig it up, but I've seen some of the sub-variants of red-black trees have dramatically better performance than I thought possible (or worse, ex: left-leaning), and the results can vary across runtimes. In addition to cache misses, there can be some other considerations for wanting to use a red-black tree like concurrent or parallel programming. In these cases, some of the cousin data structures like AVL trees can swing you for or against a red-black tree or other backing structure. I had this come up recently when selecting an interval tree that needed to be accessed between threads for instance.

Selecting a data structure is harder than most people make it out to be. There's just a lot of considerations at work, so you need to balance use-cases, performance (against those cases and raw), access patterns, allocations, and so on to come to a decision. Sometimes if you add up the total cost of doing something like replacing a red-black tree with other methods that are more cache friendly or have some other characteristic that is better, the overall performance cost can end up being worse in aggregate. So the answer is the same as always, "it depends" and the follow-up, "try it."

Parent is definitely right that academic and practical concerns of those of us in the trenches being different. That's why arguing with pure textbook evidence is stupid and you should challenge anyone who does it. Textbooks are a starting point for making and justifying a conclusion, not a shortcut.

catnaroek · on Dec 27, 2016

When you don't need persistence, a search tree data structure with fatter nodes (e.g., B-trees) is almost guaranteed to perform better on real hardware. But when you do need persistence, red-black trees become competitive again.

detrino · on Dec 27, 2016

I would expect B-trees to outperform binary trees even in a persistent data structure. The necessity of binary trees comes when you need stable iterators/references.

scott_s · on Dec 27, 2016

Theory academics are primarily concerned with bounds, but systems academics are very much concerned with cache misses. I've done work with colleagues where we were very concerned with both.

harpocrates · on Dec 27, 2016

Given the number of comments about the inefficiency of finger trees: yes they usually have a high constant factor (for their otherwise reasonable asymptotic complexity) due to cache misses. However, they are immutable and persistent, which means they have efficient sharing, which in turn makes them good candidates for

   * use in multiple threads at once
   * code that needs to be proven correct (I believe the Haskell Data.Sequence implementation is a translated from Coq)

raphlinus · on Dec 27, 2016

I looked at finger trees as a possible basis for the string representation in xi-editor, but ended up going with a simpler b-tree based approach. The better asymptotic bounds for doing manipulations at the ends are appealing in theory, but I never saw an actual problem with the O(log n) cost in practice, and it is possible to optimize the common append-only case a lot (I have a "builder" API but the current implementation is not as heavily optimized as it might be).

I _believe_ that the polymorphic recursive type, easily expressible in Haskell, cannot be expressed in Rust. You'd fake it by just using trees and having the shape as an invariant maintained by the library (just as the min and max child count constraint is maintained in a B-tree). I personally think that's fine, Rust wouldn't be a better language if its type system was made even more rich, but it's interesting to have examples so you know where the edges are. (there's also the possibility someone will find a way to encode it anyway)

harpocrates · on Dec 27, 2016

This is a fun data structure to implement in Haskell, and I've always been curious about how one would do it in C++, largely due to the fact that

    data FingerTree a = Empty
                      | Single a
                      | Deep (Digit a) (FingerTree (Node a)) (Digit a)

Has polymorphic recursion in the last case. What would be the C++ approach for dealing with this sort of thing? Pass in a compile time integer to represent the level of nesting?

zerofan · on Dec 28, 2016

I've written this several times in several ways using C++ (and I'll probably write it at least once more to make it better). In an early version I used an integer level as you came to, but it made the compiler very unhappy as it recursively tried to expand nested types at compile time (it couldn't figure out the recursive types would terminate in practice). Max template recursion of 256 if I remember correctly, even though you'd never instantiate past 45 or so on any machine in the world.

In a later version, I implemented the specializations as inheritance on the abstract FingerTree base class (verbose, but it works), and I added Leafs and Nodes. Leafs are FingerTrees that hold your data, and Nodes are FingerTrees that point to other FingerTree instance. This dodges the recursive types problem. I don't know much Haskell, but I think it would be the C++ equivalent of:

    data FingerTree a = EmptyLeaf
                    | SingleLeaf a
                    | DoubleLeaf a a
                    | TripleLeaf a a a
                    | EmptyNode
                    | SingleNode (FingerTree a)
                    | DoulbeNode (FingerTree a) (FingerTree a)
                    | TripleNode (FingerTree a) (FingerTree a) (FingerTree a)
                    | FingerSpine (FingerTree a) (FingerTree a) (FingerTree a)

Virtual methods on each specialization took the place of pattern matching. Not super elegant.

hedora · on Dec 27, 2016

When traversing the tree, I think you'll need at least one branch/virtual method dispatch to determine the type of the child node.

Given that, you could create a templated subclass for each type in the grammar, then instantiate it with the number of children. Eg:

template<N> class Node : TODO { type 'N'; int const count N; Node a[N]; }

Allocating Node<2> is a single malloc. You can add static asserts to get things like Node<4> to fail at compile time, change the c style array to std::array<Node, N> to get bounds checks, and so on.

At that point, you can get arbitrarily fancy with custom allocators, inlining the type to be stored in the tree, etc.

[edit: Note that my solution kinda sucks, because you need to pay for a virtual method dispatch to run code that has N compiled in, and I also rolled my own reflection with instance variables, so I'm bloating each node by two words instead of one, and also breaking constant propagation in the caller code.

Removing the implicit vtable or the extra instance variables is an exercise left to the reader.]

paulgb · on Dec 27, 2016

I think you'd have to use a pointer and allocate on the heap. Under the hood I think that's essentially what the compiled Haskell code would do.

skishore · on Dec 27, 2016

The polymorphic recursion allows for compile-time checking of the invariant that "the left and right wings at depth n are 2-3 trees of depth n". In C++, I think you would forgo a compile-time check of that invariant and just write code that maintains it instead.

rohmbus · on Dec 27, 2016

Could someone give an example where this data structure would be useful?

ianleeclark · on Dec 27, 2016

They're used in a distributed hash table implementation called chord (https://en.wikipedia.org/wiki/Chord_(peer-to-peer)

KingMob · on Dec 27, 2016

They're excellent general purpose structures for data sharing of immutable data (e.g., a slightly-changed copy shares much of the same data as the original). Clojure uses them extensively, since immutable data has the nice property of being thread-safe by default.

grumpyprole · on Dec 27, 2016

You can create a persistent ordered container that has both prepend and append in amortised constant time; and concatenate in log time.

geocar · on Dec 27, 2016

Caching isn't great though.

Pagetables are a trie, so it's max log log log time to prepend, append, or concatenate, and usually constant small for two of them (but admittedly: you usually have to pick if you don't know how many arrays you want).

markc · on Dec 27, 2016

There are several use-case examples in the Readme for this Clojure finger tree implementation:

https://github.com/clojure/data.finger-tree

KingMob · on Dec 27, 2016

Just want to add that the data structure behind all of Clojure's default data types are finger trees.

greydius · on Dec 27, 2016

They can be used to implement an immutable deque.

DanWaterworth · on Dec 27, 2016

The problem with using lazy data structures for storing things is that deletes don't necessarily free up storage. Fingertrees are fine when you need to store a collection in order to implement some algorithm (and being persistent makes it useful if this algorithm has to backtrack), but they aren't so good for backing a collection that changes over time, like the set of currently connected sockets in a network service.

wyager · on Dec 27, 2016

Looking at Haskell's Data.Sequence, it looks like it's strict in its elements. So a delete does free up memory as long as the deleted object isn't being used somewhere else.

DanWaterworth · on Dec 28, 2016

It needs to be spine strict for that to work, but the amortization for finger trees doesn't work without laziness

on Dec 27, 2016

[dead]

sctb · on Dec 27, 2016

This account has been posting a lot of unsubstantive, off-topic comments. Please stop and read the guidelines:

https://news.ycombinator.com/newswelcome.html

https://news.ycombinator.com/newsguidelines.html