Depending on implementation and usage, radix trees can get extremely segmented, ...

ztorkelson · on May 29, 2018

Would you mind elaborating? I'm not quite sure what you mean by segmented here.

MaxBarraclough · on May 29, 2018

I'm not thecompilr, but if I understand them correctly:

In the worst case, the cache behaviour of a radix tree could be awful, if each node lives in a different cache-line, no?

I imagine this problem can be ameliorated with a custom allocator, though.

Been a while since I've studied hash-maps in depth, but with a good hash function, that wouldn't be a big worry, would it?

ztorkelson · on May 29, 2018

Gotcha. Yes, while there are implementation techniques that can help (e.g. arena allocation, prefix compression, adaptive node sizes), the data structure is fundamentally a tree, and so one can expect a higher number of random memory accesses (when compared with suitably optimized hash tables, like those discussed in the article).

In some scenarios, radix trees can overcome that with better space efficiency (i.e. implicit key storage), but I think it's fair to say that there are many scenarios (if not most!) where hash tables outperform radix trees.

Of course, that all hinges on how you measure performance. As Salvatore pointed out, if you're looking to minimize variance, hash tables might not be suitable due to their growth characteristics. Radix trees also give you efficient sorting and range scans out of the box, and are a viable immutable persistent data structure as well. If none of those are meaningful to your use cases, then hash tables could certainly be a better fit.

MaxBarraclough · on May 29, 2018

Arena-allocation? As in (roughly speaking) a Stack<Node> data structure whose lifetime is bound to the lifetime of the tree, right?

How could we support deletion of nodes?

stochastic_monk · on May 30, 2018

This is getting deep into some wizardry, but if you used a custom pool-like allocator you could group contents close together in cache, and, if necessary/appropriate, even rearrange the elements in the container to be more cache-friendly for your purposes.

thecompilr · on May 30, 2018

You could, but the premise of using a tree was to avoid unpredictable rehashing latency, if you start compacting the tree every now and then, you basically pay the same price.

MaxBarraclough · on May 30, 2018

That approach tastes rather like a copying+compacting garbage-collector.

ztorkelson · on May 29, 2018

Right. When you ask "how could we support deletion of nodes", am I right to assume you mean "how could we reclaim memory of deleted nodes"?

Conventionally, a free list would be used, though admittedly that sacrifices locality which was one of the original motivating factors. To the extent that deletion is a significant component of the workload, then yeah, this doesn't help.

MaxBarraclough · on May 30, 2018

Yes, exactly. It would of course be trivial to unlink a node without reclaiming it, but then you're leaking.

Radix trees never rotate, right? Can that be leveraged to help with locality?

thecompilr · on May 30, 2018

Thank you. Exactly. Even worse each node could live in a different page. Hash maps can span multiple pages, but the lookup should succeed in 1 or 2 tries for a good hash function.