The Redis Manifesto

shapeshed · on Feb 4, 2012

'Code is like a poem'. 'We optimise for joy'.

Too often software development is veiled in acronyms and jargon. It is great to see a lot of humanity in this manifesto. Software development becomes a huge part of a benevolent dictators life. The simple aspiration of saying 'I want to enjoy this' is a refreshing thing to see.

pjscott · on Feb 4, 2012

Impressively, the Redis code actually lives up to the lofty rhetoric.

https://github.com/antirez/redis

It is some of the best C code that I've seen.

psykotic · on Feb 4, 2012

The very first function of the very first file I read had an algorithmic bug. It's dictGetRandomKey() in dict.c. First it finds a random occupied bucket by rejection sampling. Then it proceeds like this:

    /* Now we found a non empty bucket, but it is a linked
     * list and we need to get a random element from the list.
     * The only sane way to do so is counting the elements and
     * select a random index. */
    listlen = 0;
    orighe = he;
    while(he) {
        he = he->next;
        listlen++;
    }
    listele = random() % listlen;
    he = orighe;
    while(listele--) he = he->next;
    return he;

This sampling algorithm is not truly random. Elements in smaller buckets are more likely to be picked. Especially in moderately-sized dictionaries the skew can be significant, even with a good hash function with pseudorandomness characteristics (Chernoff-style bounds don't say much for small sizes). To say nothing of what could happen in an adversarial situation, where an attacker might heavily skew the outcome of a "random" draw by affecting the distribution of bucket sizes. That could potentially be much worse than the usual denial-of-service attacks on hash tables.

Unfortunately, there's no good way to fix this with their hash table implementation. If they had used open adddressing (which is generally superior to external chaining for other reasons as well), this wouldn't be a problem. You'd just try slots at random until you found an occupied one, which you'd return.

(Even if they wanted to use this skewed sampling method, you can pick a random element in a linked list of unknown length in a single traversal. It's simple enough to be a standard interview question, so the two-pass method is certainly not "the only sane way". The benefit of doing that here is admittedly very limited, since the buckets will be small to ensure the efficiency of hash table lookups, and that cost is dominated by cache misses.)

Pedantry on my part? Perhaps. But if nothing else, it betrays a lack of attention to details.

js4all · on Feb 4, 2012

Brilliant. I wish more developers would adopt to these principles.

I recently discovered redis while looking for a website statistics option. I was impressed how easy and well designed redis is. Starting with building it, over using it to sample the data, to querying the data for the charts.

I was so impressed, that we will add Redis to our hosting offerings for Node.js apps.

twsted · on Feb 4, 2012

Just FYI, it was written almost a year ago.