Hacker News new | past | comments | ask | show | jobs | submit login

Depending on implementation and usage, radix trees can get extremely segmented, slowing down significantly over time.



Would you mind elaborating? I'm not quite sure what you mean by segmented here.


I'm not thecompilr, but if I understand them correctly:

In the worst case, the cache behaviour of a radix tree could be awful, if each node lives in a different cache-line, no?

I imagine this problem can be ameliorated with a custom allocator, though.

Been a while since I've studied hash-maps in depth, but with a good hash function, that wouldn't be a big worry, would it?


Gotcha. Yes, while there are implementation techniques that can help (e.g. arena allocation, prefix compression, adaptive node sizes), the data structure is fundamentally a tree, and so one can expect a higher number of random memory accesses (when compared with suitably optimized hash tables, like those discussed in the article).

In some scenarios, radix trees can overcome that with better space efficiency (i.e. implicit key storage), but I think it's fair to say that there are many scenarios (if not most!) where hash tables outperform radix trees.

Of course, that all hinges on how you measure performance. As Salvatore pointed out, if you're looking to minimize variance, hash tables might not be suitable due to their growth characteristics. Radix trees also give you efficient sorting and range scans out of the box, and are a viable immutable persistent data structure as well. If none of those are meaningful to your use cases, then hash tables could certainly be a better fit.


Arena-allocation? As in (roughly speaking) a Stack<Node> data structure whose lifetime is bound to the lifetime of the tree, right?

How could we support deletion of nodes?


This is getting deep into some wizardry, but if you used a custom pool-like allocator you could group contents close together in cache, and, if necessary/appropriate, even rearrange the elements in the container to be more cache-friendly for your purposes.


You could, but the premise of using a tree was to avoid unpredictable rehashing latency, if you start compacting the tree every now and then, you basically pay the same price.


That approach tastes rather like a copying+compacting garbage-collector.


Right. When you ask "how could we support deletion of nodes", am I right to assume you mean "how could we reclaim memory of deleted nodes"?

Conventionally, a free list would be used, though admittedly that sacrifices locality which was one of the original motivating factors. To the extent that deletion is a significant component of the workload, then yeah, this doesn't help.


Yes, exactly. It would of course be trivial to unlink a node without reclaiming it, but then you're leaking.

Radix trees never rotate, right? Can that be leveraged to help with locality?


Thank you. Exactly. Even worse each node could live in a different page. Hash maps can span multiple pages, but the lookup should succeed in 1 or 2 tries for a good hash function.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: