A 10-minute description of how Judy arrays work and why they are so fast

jemfinch · on June 10, 2010

Also worth investiation: a comparison of 20000 lines-of-code Judy versus a 200 lines-of-code hash table: http://www.nothings.org/computer/judy/

AlisdairO · on June 10, 2010

It's worth noting that this comparison is flawed: Judy (iirc) is designed for 64 byte cache line sizes, yet the test is performed on a 32 byte cache line processor. Currently, most common processors have a cache line size of 64 bytes, so nowadays it's quite likely that Judy is a fair bit faster. Further, the processor is old enough that memory delays were (while still important) perhaps less of a big deal than they are now: the relative latency of RAM to processor speed wasn't as bad.

Yes, this does highlight the problem with such heavily optimised structures - that they are somewhat hardware specific - but it doesn't mean that this is an accurate demonstration of the performance improvement you'd see nowadays.

tptacek · on June 10, 2010

The fact that Judy has to be optimized (and extensively modified) for specific architectures is a big part of this article's critique.

AlisdairO · on June 10, 2010

Absolutely, and I certainly acknowledge that this is a fair criticism. I guess my beef with it is more that the tone of the article is "Judy has to be extensively modified depending on the machine it runs on, and it's not even that much faster" - whereas, given that he's testing it on a machine for which Judy was not optimised, one could quite reasonably take the position of "While you have to optimise Judy for the machine it's running on to get the best out of it, even if you don't it's substantially faster than a hash table. The downside is code complexity".

Personally, while the 32-byte cache line size is mentioned as an aside in a couple of places, I'd prefer to see it acknowledged in a bit more of an upfront manner. He doesn't exactly go out of his way to say that Judy would be a lot faster on a different machine.

dfox · on June 10, 2010

But Judy as it stays is extensively optimized for relatively common architectures (and performance degradation on 32B cache line systems is not too great). In fact most software optimized for caching behavior that I know of is optimized for 64B cache lines.

Also even simple hash table greatly benefits from even trivial cache-related optimization, so I would say that while Judy is extensively optimized for particularly common expectation, simple hash tables have to be optimized for every platform also.

On the other hand, for most use-cases these optimizations does not matter much, but there are special cases. Python's dictionary is great example (by thew way it is if I remember correctly also optimized for 64B cache lines)

ars · on June 10, 2010

At one point data structures were tuned to work fast when loaded from disk (avoid seeks, try to read linearly).

Then they were tuned to work fast from inside main memory (avoid jumps).

Now they are tuned to work within the typical cache of a CPU (keep reads near each other).

anamax · on June 10, 2010

You forgot tapes (don't seek, and we mean it) and avoid arithmetic.

Today, divides can be cheaper than branch mispredictions and cosines can be cheaper than cache misses.

Irfaan · on June 10, 2010

The very last line from the article:

Well I cannot describe Judy in 10 minutes -- what possessed me? I hope you understand some of what I have said and question me on the rest -- particularly those doubts. I will try to elaborate on parts where I get questions.

An interesting read, nonetheless. But a warning for those expecting a full description in 10 minutes. :)

bsaunder · on June 10, 2010

EDIT: Sorry, placed comment on wrong parent, should be on: http://news.ycombinator.com/item?id=1419647

In this post.

Additionally, some of Judy's space and speed improvements come from strategies that could be used by any data structure but aren't because they're not good software engineering.

Seemed like an odd statement. That's like saying "Assembly is obtuse and it's bad software engineering to use it." IMHO, it depends. If you are writing low-level code that will run very frequently, perhaps good engineering would suggest that you should optimize in assembler.

I think, there are places in your code where you need good runtime performance and spending programmer time and sacrificing generic dogma are justified.

Like the author says at the end. If some tool/library/service meets my needs (in this case robust and fast being among them), then I'm not too concerned about how it gets there:

Of course, you can certainly take the attitude that Judy performs well enough, and the fact that it's 20,000 lines of code doesn't really matter, as long as those lines of code work and you never have to look at them--and it appears Judy is mature enough that this is the case.

darose · on June 10, 2010

Funny this should get posted today. I was just reading about this very data structure last week!

torpor · on June 10, 2010

I love Judy, I have used it for years, its really the only dynamic array a C programmer needs .. ;) Well, okay Judy + list.h (from the linux kernel) == all you need!

jerdfelt · on June 10, 2010

sys/queue.h

It has singly-linked list, lists, simple queues, tail queues and circular queues.

If you're comfortable with Linux kernel style lists, then sys/queue.h will be familiar, and it's shipping in glibc and on BSD systems (where it was originally from).

torpor · on June 11, 2010

You're right, sys/queue.h is wonderful ..