Caches and Lisp: faster list processing via auto-rearranging memory (2003) [pdf]

zvrba · on March 9, 2015

In section 4 they found that traversing an organized list is FASTER than traversing an array of the same size when both fit into the L1 cache. I find this highly implausible [0] because 1) traversing a list needs more instructions, 2) both L1 and L2 cache miss numbers are smaller for the array. The authors provide no explanation for the result either.

[0] I believe they got these numbers, but they should have dug more into the cause instead of accepting the conclusion at face value. Given this is a LISP benchmark, maybe array bounds checks are costly? Maybe a different result would have been obtained had they instructed the compiler to generate unsafe code?

davidshepherd7 · on March 9, 2015

I agree this seems very odd (and I came to the comments hoping for an answer).

edit: Since the conclusion doesn't mention anything about the list being faster than the array it seems possible that it is just a typo.

Also the paper mentions (in section 2) that:

        Occasionally when a recorded benchmark time seemed to be implausible,
        (perhaps caused by other activity on the computer system), or we re-tested
        the data point for other reasons and found a significant variation, we indi-
        cate the repeated time as well.

which seems to imply that the timing experiments were not usually repeated.

diroussel · on March 9, 2015

It's to do with branch prediction.

http://stackoverflow.com/questions/11227809/why-is-processin...

davidshepherd7 · on March 9, 2015

Seems unlikely: in the experiments here the lists are not sorted in the sense of the stack overflow question (i.e. they are not in numerical or alphabetical order). They are sorted (in memory) by their index in the linked list. By definition the array is already in this order.

Unless I've missed something?

zvrba · on March 9, 2015

No. Look at David's comment.

gsg · on March 9, 2015

Interesting that it doesn't mention CDR-coding - I guess that's considered competitive only with hardware support. There's been some work by Appel on trying to improve performance by unrolling (in his case, immutable) cons lists, too.

joe_the_user · on March 9, 2015

A fascinating idea.

I haven't digest the entire paper yet.

I am wondering already whether this idea would be combined with the schemes which yield cache-oblivious algorithms.