It's worth noting that this comparison is flawed: Judy (iirc) is designed for 64 byte cache line sizes, yet the test is performed on a 32 byte cache line processor. Currently, most common processors have a cache line size of 64 bytes, so nowadays it's quite likely that Judy is a fair bit faster. Further, the processor is old enough that memory delays were (while still important) perhaps less of a big deal than they are now: the relative latency of RAM to processor speed wasn't as bad.
Yes, this does highlight the problem with such heavily optimised structures - that they are somewhat hardware specific - but it doesn't mean that this is an accurate demonstration of the performance improvement you'd see nowadays.
Absolutely, and I certainly acknowledge that this is a fair criticism. I guess my beef with it is more that the tone of the article is "Judy has to be extensively modified depending on the machine it runs on, and it's not even that much faster" - whereas, given that he's testing it on a machine for which Judy was not optimised, one could quite reasonably take the position of "While you have to optimise Judy for the machine it's running on to get the best out of it, even if you don't it's substantially faster than a hash table. The downside is code complexity".
Personally, while the 32-byte cache line size is mentioned as an aside in a couple of places, I'd prefer to see it acknowledged in a bit more of an upfront manner. He doesn't exactly go out of his way to say that Judy would be a lot faster on a different machine.
But Judy as it stays is extensively optimized for relatively common architectures (and performance degradation on 32B cache line systems is not too great). In fact most software optimized for caching behavior that I know of is optimized for 64B cache lines.
Also even simple hash table greatly benefits from even trivial cache-related optimization, so I would say that while Judy is extensively optimized for particularly common expectation, simple hash tables have to be optimized for every platform also.
On the other hand, for most use-cases these optimizations does not matter much, but there are special cases. Python's dictionary is great example (by thew way it is if I remember correctly also optimized for 64B cache lines)
Well I cannot describe Judy in 10 minutes -- what possessed me? I hope you understand some of what I have said and question me on the rest -- particularly those doubts. I will try to elaborate on parts where I get questions.
An interesting read, nonetheless. But a warning for those expecting a full description in 10 minutes. :)
Additionally, some of Judy's space and speed improvements come from strategies that could be used by any data structure but aren't because they're not good software engineering.
Seemed like an odd statement. That's like saying "Assembly is obtuse and it's bad software engineering to use it." IMHO, it depends. If you are writing low-level code that will run very frequently, perhaps good engineering would suggest that you should optimize in assembler.
I think, there are places in your code where you need good runtime performance and spending programmer time and sacrificing generic dogma are justified.
Like the author says at the end. If some tool/library/service meets my needs (in this case robust and fast being among them), then I'm not too concerned about how it gets there:
Of course, you can certainly take the attitude that Judy performs well enough, and the fact that it's 20,000 lines of code doesn't really matter, as long as those lines of code work and you never have to look at them--and it appears Judy is mature enough that this is the case.
I love Judy, I have used it for years, its really the only dynamic array a C programmer needs .. ;) Well, okay Judy + list.h (from the linux kernel) == all you need!
It has singly-linked list, lists, simple queues, tail queues and circular queues.
If you're comfortable with Linux kernel style lists, then sys/queue.h will be familiar, and it's shipping in glibc and on BSD systems (where it was originally from).