Storing hundreds of millions of simple key-value pairs in Redis

moxiemk1 · on Nov 1, 2011

The the code the article links to for zipmap.c (https://github.com/antirez/redis/blob/unstable/src/zipmap.c) is rather literate.

I haven't dug extremely deeply into the sources for many F/OSS projects; the code I'm interested in reading has been inevitably opaque (at least to my inexperience). This particular source file (and maybe the rest of Redis?) is really good. I think I'll be taking many more looks at Redis (code- and usage-wise) in the future.

mikeyk · on Nov 1, 2011

Back when we were getting started with Redis, the readability / concise nature of the project was one of the things that most excited me about it (here's what it looked like around then: https://github.com/antirez/redis/tree/0b420168b485d0a9c4b66d...)

cperciva · on Nov 1, 2011

I'm curious, what in particular makes you say that code is "readable"?

Coding style varies dramatically from person to person, and I don't mean this as a criticism of antirez, but any code which doesn't have at minimum a one-line comment before each function explaining its purpose immediately fails the "readability" test for me. Obviously this isn't a problem for you, so I'm curious to hear what your tastes are.

antirez · on Nov 1, 2011

Hi cperciva, I actually think that comments are not a so important part of code quality. I tend to add comments where my code risks to be not clear by itself, and the zipmap.c code is indeed more commented than my average code since it is all about encoding stuff in a binary blob string, playing with pointers, and so forth. So actually too much comments may even be a sign that something is bad about the code.

IMHO good code should be readable since the purpose of different files, functions, statements, data structures, should be obvious (at different levels of course), and every time it is not obvious there should be a comment helping the reader to understand what is going on.

My idea is that programmers with time develop a feeling about when a comment is needed. For instance a comment is needed all the times you are writing something that avoids a specific problem but you'll likely not remember why it was needed in a few weeks. Other times comments are useful since the flow of the function is complex and there is no easy way to refactor it into many pieces, so comments help to organize the function in smaller conceptual parts, and so forth.

There is no absolute rule, but the reality is that IMHO the test is simple to do for external people: good code is easy to understand and modify without being an expert of that code base.

This topic is a good idea for a blog post, since I thought a lot about this issues lately. For instance if you want a place in Redis where code should be improved is in the handling of blocking operations: there are a few things in that code that are absolutely non obvious even adding comments, and you either are a lot "into it" or you'll not have an easy time understanding it. I'm planning a refactoring of that piece of code.

josephg · on Nov 2, 2011

I used to consider most comments a sign of poorly factored code. (If you need comments, you've probably written your code in an obtuse way).

I started a new project a few weeks ago and went to the other extreme to see what it was like. I think I have more comments than code:

https://github.com/josephg/node-browserchannel/blob/master/l...

Rendered with docco: http://josephg.github.com/node-browserchannel/docs/server.ht...

The comments in this file also describe the spec of what I'm implementing and give running commentary on the implemention. It took me awhile to get used to writing like this, but I really like the result. When I want to add a function or something I talk about what I want to achieve in a comment block before writing the code.

Somehow, the whole feeling of programming is different. By making the documentation the highest priority, I feel like I'm writing rather than coding. (I hate writing generally, but I really enjoyed it here.)

I'm not sure if it helps or hinders readability. Obviously there's way more information in there about my intensions. You can tell if surprising behaviour in the code is intended or if its a bug. But the code has become really long and you have to scroll heaps to be able to move through the file. I feel like my monitor has shrunk. Maybe I just need to split the code out into more files.

I'm not sure if I want to program like this all the time; but I'm writing a lot more inline documentation now. When you get stuck trying to decide how to implement something, talking about the design choices in a block comment has the same effect as explaining the problem to a coworker. Only this way you don't bother your coworkers and you have documentation on the choice when you're done.

Its also a fun experiment because of how weird it feels - I highly recommend giving it a go sometime.

cperciva · on Nov 1, 2011

I tend to add comments where my code risks to be not clear by itself

I used to take that position, but I've started adding more comments in order to avoid "mental stack overflows". Suppose I'm reading function A and trying to understand it, then I find a call to function B which doesn't have any comment explaining what it does; I then go look at function B, and find it has a call to function C which is equally lacking in commentary; and by the time I've read the code in function C to understand what it does and gone back to function B to understand what it's doing I've completely lost track of what I was looking at in function A.

Of course, if you already know what most of the code is doing you don't run into such stack overflows because whatever code you're looking at is probably only calling functions you already understand. But for people who are new to the code -- or people who haven't looked at it for a couple years and have forgotten most of the details -- I think asking people to read the code to figure out what a function does is too much of a bar to understanding.

simplify · on Nov 1, 2011

In my opinion, if you can't figure out what a function does by its name and its parameter names, it is a poorly named and thus poorly documented function.

I think function and variable naming is the one of the most important aspects of programming. Without good naming, you can easy double the amount of time it takes to edit and extend functionality.

cperciva · on Nov 1, 2011

There's a tradeoff in the naming of functions. If you have lots of functions named functionWhoseNameMakesItVeryClearExactlyWhatItDoesToItsParameters(), your code will end up being hard to read simply by virtue of the volume of text people need to read through. I prefer to have gist_of_functionality() with more details easily accessible in an obviously-placed comment.

mbreese · on Nov 1, 2011

A function may have the perfect name at the time you wrote it. It may make perfect sense within the context that you initially conceived of it. However, after some time away from it, when you're trying to mentally rebuild that context, it may make as much sense as def foo().

Good naming is important, but you also have to know the context, which is more difficult to remember.

ryanpetrich · on Nov 2, 2011

If it only makes sense within the context it was conceived under, that's not a perfect name. A function name should make sense for someone with understanding of the domain, but no understanding of the code. When that's not possible, comments definitely help.

sbov · on Nov 1, 2011

I don't understand how you can really have a function name that properly says what it does without it being overly long.

In particular, function names seem to focus more on expressing the postcondition but make no mention of preconditions, exceptions, error conditions - all which also need to be known when figuring out what a function does.

In other words, strlen becomes strlen_segfaultifstrnull. You could theoretically encode that in the parameter name, but it becomes even messier when you have functions that raise exceptions. And now instead of long function names, all your functions are using long variable names that serve no purpose other than documenting said error conditions.

mikeyk · on Nov 1, 2011

Honestly, it was mostly the concision that struck me at the time. Looking at the zipmap source now, I think it's more or less where my code tends to end up in terms of comments--some motivation / higher-level comments at the top, and most functions have a comment before, hopefully with some explanation of any edge/NULL cases as well. Any open-source projects you'd point to as good examples of what looks readable to you? Always trying to improve my own coding habits as well.

cperciva · on Nov 1, 2011

I like to think that most of my recent code is pretty readable. The largest chunk of open source is my kivaloo data store (http://www.tarsnap.com/kivaloo.html, browsable svn repository at http://code.google.com/p/kivaloo/source/browse/).

qeorge · on Nov 1, 2011

Very readable for something so dense. Nice work!

One suggestion from the peanut gallery:

In http://code.google.com/p/kivaloo/source/browse/trunk/lib/dat..., at line 178 you have:

    if (rehash(H))
        goto err0;

That's because rehash returns 0 upon success, and -1 on failure.

To me, that's very confusing, because it reads to be erroring upon success. When I read rehash's body, I learn that 0 means success and -1 means failure.

I'd prefer one of two methods. First, you could return 1 on success, and keep returning something falsy on failure. Then the code would read:

    if (!rehash(H))
        goto err0;

Which I think is more readable. Perhaps even better would be to use a constant, i.e.,

    if (rehash(H) != REHASH_SUCCESS)
        goto err0;

although that can easily lead to a mess of redundant constants, or a headache managing them.

Just my personal preference. Really nice code, thank you for sharing it.

cperciva · on Nov 1, 2011

Very readable for something so dense. Nice work!

Thanks!

you could return 1 on success, and keep returning something falsy on failure

I come from an OS background, so to me 0 is success and non-zero is failure. It doesn't really matter which convention a project uses as long as it's consistent, so I documented this in my /STYLE file: "In general, functions should return (int)(-1) or NULL to indicate error."

nkurz · on Nov 2, 2011

Both you and Salvatore have very readable code. I'm uncomfortable with having both 'H' and 'h' in the same function, and with using 'l' as a variable, but both of these files are exemplary C. Still, although it may just be personal preference, I give the slight edge to Salvatore on clarity.

I have no real problem with 0 for success, but I'm with 'qeorge' on this. How is someone looking at only this function call supposed to determine if rehash() returns an integer or a pointer? My first guess would have been that you were checking for a non-NULL pointer. The 'goto err' makes it relatively clear after the fact, but not at a glance.

If you want to keep returning -1, I think a better convention would be 'if (rehash(H) < 0)', which makes it more clear that you expect an int and better implies an error condition. But I think even better would be to return KV_SUCCESS or KV_FAILURE (defined however you choose) and check explicitly. This would also let you remove a bunch of lines by getting rid of every comment next to every return statement! :)

cperciva · on Nov 2, 2011

I'm uncomfortable with having both 'H' and 'h' in the same function

To be honest, I don't like that either, and normally wouldn't do it. In this case I decided that the benefits of consistency with other code ([H]ash table structure; [h]ash of some data) outweighed the ugly near-collision.

How is someone looking at only this function call supposed to determine if rehash() returns an integer or a pointer?

If it returned a pointer, that line would have been "if ((foo = rehash(H)) == NULL)" -- I don't return pointers which aren't going to be used, and I don't write "if (pointer)". Again, it's a matter of knowing the house style.

I think even better would be to return KV_SUCCESS or KV_FAILURE (defined however you choose) and check explicitly

If it was just a matter of one function, that might be reasonable. But applying that to the entire project I'd have LBS_STORAGE_SUCCESS, LBS_WORKER_SUCCESS, LBS_DISK_SUCCESS, LBS_DISPATCH_SUCCESS, KVLDS_DISPATCH_SUCCESS, KVLDS_SERIALIZE_SUCCESS, BTREE_FIND_SUCCESS, BTREE_MUTATE_SUCCESS, BTREE_SUCCESS, BTREE_CLEANING_SUCCESS, BTREE_NODE_SUCCESS, MUX_DISPATCH_SUCCESS, NETBUF_SUCCESS, PROTO_LBS_SUCCESS, PROTO_KVLDS_SUCCESS, EVENTS_SUCCESS, WIRE_READPACKET_SUCCESS, WIRE_WRITEPACKET_SUCCESS, WIRE_REQUESTQUEUE_SUCCESS, UTIL_SUCCESS, ELASTICQUEUE_SUCCESS, PTRHEAP_SUCCESS, SEQPTRMAP_SUCCESS, and ELASTICARAY_SUCCESS. Plus all the _FAILUREs.

Much simpler to just say "0 is success, -1 is failure" once.

nkurz · on Nov 2, 2011

That's fair, although it seems like in that case TARSNAP_FAILURE and TARSNAP_SUCCESS would then be a fine alternative.

I think the whole question is how much it's OK to have an house style that isn't immediately accessible, and how much that benefits the project. Is it really a benefit if outsiders can grasp the code immediately, or is it actually good to have a hurdle?

Do you have an argument against "rehash(H) < 0" other than the visual distraction? It seems like it would parallel better with a literal comparison to NULL.

cperciva · on Nov 2, 2011

TARSNAP_FAILURE and TARSNAP_SUCCESS would then be a fine alternative

Sure, until someone looking at kivaloo or spiped or scrypt asks "what the heck is tarsnap?" ;-)

house style that isn't immediately accessible

I think it's very likely that people working on the kivaloo code will have at least a passing familiarity with UNIX system call conventions, and would find my house style entirely accessible.

Do you have an argument against "rehash(H) < 0" other than the visual distraction?

I would interpret that to indicate that the function has several potential return codes, not just 0 or -1.

qeorge · on Nov 2, 2011

Gotcha. I'm not an OS guy, and didn't know that convention. Thanks!

Dumb question though:

If rehash can only return -1 or 0, won't

   if(rehash(H))

always fail?

cperciva · on Nov 2, 2011

Dumb question though...

There's no such thing as a dumb question, only people too dumb to take every available opportunity to learn. ;-)

In C, "if (foo)" means "if (foo != 0)" if foo has integer type, so that line means "if (rehash(H) != 0)" or (since rehash only returns 0 or -1) equivalently "if (rehash(H) == -1)".

qeorge · on Nov 2, 2011

Thanks so much Colin, I'm learning a lot today. :)

ralph · on Nov 3, 2011

The reason for the convention is that there's often only one form of success, so 0 suffices, but there may be many causes of failure.

Retric · on Nov 1, 2011

I wonder how individualistic that really is.

I like well documented code, I don't trust that documentation to be accurate so I don't include it in my readability score. I like to have around 10 to 40 lines per function, reasonable levels of reuses, a reasonable overall structure and function depth to stay reasonable aka f1 call f2... calls f10 is fine, f1 calls f2... calls f40 smells bad. f1 calls f2 calls f1 is fine though. And stay away from the more colorful parts of the language.

How about you?

cperciva · on Nov 1, 2011

I generally don't trust external documentation, but I do trust in-line comments.

catwell · on Nov 2, 2011

You're probably wrong. Look at the comment above "stretches" here: https://github.com/binarylogic/authlogic/blob/master/lib/aut...

This is crypto-related code in of the most popular Rails authentication gems. And this is one among many examples. I'm sure you've already come across something like that more than once.

I trust code rather than comments. At least it doesn't lie.

vijaydev · on Nov 2, 2011

Fixed :-) https://github.com/binarylogic/authlogic/commit/f52a699bbc9e...

catwell · on Nov 3, 2011

Incorrectly, see my comment on your link. But thanks for trying :) If you want my PoV on this, the right way to fix it is delete the comment.

gorset · on Nov 1, 2011

1 million pair using 16 MB is about 16 bytes per pair, which is perfectly fine but nothing impressive.

The dataset is static, so a simple naive solution would be to create a big array sorted by key. Assuming both photo and user IDs use 4 bytes each, this would result in about 2GB of data. Then use binary search to lookup values.

However, if we really want to reduce the size, we could build a finite state machine from the dataset (maybe reverse the values to increase the level of shared suffixes) which should reduce the size by an order of magnitude.

cperciva · on Nov 1, 2011

The dataset is static

If I read the article correctly, existing entries won't change but new entries will be inserted.

mikeyk · on Nov 1, 2011

Yep, ~30 inserts per second go in, so it's not static.

maurits · on Nov 1, 2011

build a finite state machine from the dataset

Could you elaborate on that? Or provide a link for the novice?

gorset · on Nov 1, 2011

Sure,

I implemented the paper "How to squeeze a lexicon" http://www.n3labs.com/pdf/lexicon-squeeze.pdf with a few tweaks a while back. It uses a simple fixed format for all nodes, which works very well for datasets which contains many common prefixes and suffixes. For example, a polish dictionary containing 1.3 million strings is compressed down to 726KB. One of the thing I'm using it for is in a database for geopip stuff. It uses less than 1 byte per IP and can answer queries such as "find all values for the given IP-range" in O(k).

Dawid Weiss has a talk about how fine state machines are used in lucene at http://vimeo.com/26517310

The two approaches I've seen used to store key-value pairs in lexicons are:

1. Embed the value using a magic separator. E.g. (key0, value0), (key1, value1) becomes "key0.value1" and "key1.value1". The lexicon supports fast prefix lookups, which is used when looking up a key. As an added bonus, it supports multiple values per key out of the box, and you can easily find all key/values for a given prefix.

2. Add support for perfect minimal hashing to the lexicon. It will give you a unique ID for all keys, which can be used to do a direct lookup in another array containing the values. How to do this is not covered by the paper, but is an interesting exercise for the reader :-)

Now, back to the article. As cperciva mentioned, the dataset in the article isn't completely static and the usefulness for finite state machines drops fast if they need to be recompiled often. However, the key-value pairs are immutable, so it would be straightforward to implement a level-based scheme by using a simple structure for new items and move them into compressed lexicons periodically.

mkjones · on Nov 11, 2011

Isn't this basically a trie?

maurits · on Nov 1, 2011

Thanks for your answer, I do appreciate that.

cperciva · on Nov 1, 2011

Best of all, lookups in hashes are still O(1), making them very quick.

How quick is "very quick"? I was hoping to see some performance benchmarks, not just memory usage benchmarks.

minimax · on Nov 1, 2011

Best of all, lookups in hashes are still O(1), making them very quick.

Based on the zipmap code (linked below), a zipmap is implemented as an array of adjacent (key, value) pairs. Lookup in a zipmap is actually linear search. There is no hashing. The lookup will run in time proportional to the number of entries in the map.

We found this setting was best around 1000; any higher and the HSET commands would cause noticeable CPU activity

This shouldn't be surprising either, as the redis documentation states that "If a specially encoded value will overflow the configured max size, Redis will automatically convert it into normal encoding." Their higher level hashing strategy of dividing the media id by 1000 guarantees that 1000 is the maximum number of entries in any zipmap. Setting hash-max-zipmap-entries to anything lower than 1000 means some of their zipmaps will be converted to normal redis key/value encodings.

https://github.com/antirez/redis/blob/unstable/src/zipmap.c

firefoxman1 · on Nov 2, 2011

Wow I didn't realize that redis zipmap lookups aren't actually O(1). I found a document that describes zipmaps as "very memory efficient data structure to represent string to string maps, at the cost of being O(N) instead of O(1) for most operations. Since the constant times of this data structure are very small, and the zipmaps are converted into real hash tables once they are big enough, the amortized time of Redis hashes is still O(1), and in the practice small zipmaps are not slower than small hash tables because they are designed for good cache locality and fast access."

http://redis-docs.readthedocs.org/en/latest/Hashes.html

Aqua_Geek · on Nov 1, 2011

Redis includes a benchmark utility (http://redis.io/topics/benchmarks). I ran it on my MacBook Pro a couple of weeks ago and got north of 50,000 req/s with a bunch of other crap running. I don't think the utility has any tests for hashes, though.

cperciva · on Nov 1, 2011

Oh, I've seen other redis benchmarks. But it's always nice to see more -- that way when someone wants to guess how fast their problem set will run on their hardware they will have a more similar benchmark to compare it against.

petercooper · on Nov 2, 2011

First, I love Redis :-)

Second, this functionality seems to be a stop gap to support old users who may be using old clients. So they need an array of 300 million elements each containing an integer of 12 million or less. Assuming 32 bit values (24 would work but.. efficiency), that's a 1,144MB array which, in theory, could be stored as a file and cached through the filesystem cache.

I wonder how the performance of that would stack up against Redis. The convenience of Redis being a network daemon out of the box is potentially the big win here, though the memory usage even in the optimized case seems to be around 4x given that it's doing more than the task necessarily requires (from my interpretation of it - I could be wrong!)

mythz · on Nov 1, 2011

I love companies like Instagram who take the time to share their scaling issues and experiences.

thezilch · on Nov 1, 2011

You can find similar examples -- ruby code for the "exact" implementation as described by Instagram -- of this and other redis, memory optimization(s) at the Redis site: http://redis.io/topics/memory-optimization

catwell · on Nov 1, 2011

Lookups are not really O(1), they're O(number of keys per hash) as long as the hashes are zipmaps. When they become full blown hashes the memory usage increases.

Still, this is a very good way to store a lot of key/value pairs in Redis.

pjscott · on Nov 1, 2011

Big-O notation refers to asymptotic behavior. The zipmap encoding of hashes only matters for small values bounded by a constant, so hash lookups are still expected O(1) time in Redis.

</extreme-pedantry>

catwell · on Nov 2, 2011

Your point would be valid if the memory gain observed was not directly dependent on the fact that hashes are zipmap-encoded. So the trade-off here is between the constant factor of time complexity and the constant factor of memory complexity.

pedigree · on Nov 1, 2011

Why use clear text numbers? Most of the time, you're going to be using large numbers, so binary pack them as save more space.

i had the same issue, normal storage was 1.1gb of space, HSET down to 200mb and binary packing every integer down dbl() bought it right down to 163mb of memory (32bit instance). For that 163mb, I was slicing a md5 of the field for the hset prefix, packing that and then using the remainer as the hset suffix. (due to the data format of the input field)

eurleif · on Nov 1, 2011

Internally, redis stores integers as 64-bit binary values, not strings: http://redis.io/commands/incr.

LeafStorm · on Nov 1, 2011

I don't think it does that for keys and values in hashes though. The only int-related optimization it does in the collections is integer sets.

ctoestreich · on Nov 4, 2011

I would like to see some example code or a gist of what you did. I played around with custom number packing and using the hashset in redis inspired by the original thread topic and your comment. You can find the post here: http://www.christianoestreich.com/2011/11/redis-hashsets-per...

nknight · on Nov 2, 2011

Redis is a series of tradeoffs between simplicity, performance, and power. For the moment, simplicity won as far as keys go -- they're all strings, period, end of story.

Supporting other key types is potentially useful, but brings complications, some of which will be visible to users of Redis.

Your own measurements are telling. Your memory usage went down by about 82%, packing the integers brought that to a bit less than 86%. It's easy to justify a small increase in complexity to save 82%, it's a lot harder to justify even more complexity to bring that number to 86%.

zellyn · on Nov 1, 2011

Why use the whole Media ID as the key within the bucket, rather than just the last three digits?

ctoestreich · on Nov 4, 2011

I think they were trying to keep the hash sizes to ~1000 elements each key. At a 3 digit key you would be increasing the hashes to 300k elements each. You should augment the scripts and see if that has an impact on the performance, would be curious to see.

examancer · on Nov 1, 2011

The hash data type is probably the most awesome and underused feature of redis. Here is a small gem I wrote to expose it a little better in ruby: https://github.com/lyconic/redis-native_hash