Back when we were getting started with Redis, the readability / concise nature o...

cperciva · on Nov 1, 2011

I'm curious, what in particular makes you say that code is "readable"?

Coding style varies dramatically from person to person, and I don't mean this as a criticism of antirez, but any code which doesn't have at minimum a one-line comment before each function explaining its purpose immediately fails the "readability" test for me. Obviously this isn't a problem for you, so I'm curious to hear what your tastes are.

antirez · on Nov 1, 2011

Hi cperciva, I actually think that comments are not a so important part of code quality. I tend to add comments where my code risks to be not clear by itself, and the zipmap.c code is indeed more commented than my average code since it is all about encoding stuff in a binary blob string, playing with pointers, and so forth. So actually too much comments may even be a sign that something is bad about the code.

IMHO good code should be readable since the purpose of different files, functions, statements, data structures, should be obvious (at different levels of course), and every time it is not obvious there should be a comment helping the reader to understand what is going on.

My idea is that programmers with time develop a feeling about when a comment is needed. For instance a comment is needed all the times you are writing something that avoids a specific problem but you'll likely not remember why it was needed in a few weeks. Other times comments are useful since the flow of the function is complex and there is no easy way to refactor it into many pieces, so comments help to organize the function in smaller conceptual parts, and so forth.

There is no absolute rule, but the reality is that IMHO the test is simple to do for external people: good code is easy to understand and modify without being an expert of that code base.

This topic is a good idea for a blog post, since I thought a lot about this issues lately. For instance if you want a place in Redis where code should be improved is in the handling of blocking operations: there are a few things in that code that are absolutely non obvious even adding comments, and you either are a lot "into it" or you'll not have an easy time understanding it. I'm planning a refactoring of that piece of code.

josephg · on Nov 2, 2011

I used to consider most comments a sign of poorly factored code. (If you need comments, you've probably written your code in an obtuse way).

I started a new project a few weeks ago and went to the other extreme to see what it was like. I think I have more comments than code:

https://github.com/josephg/node-browserchannel/blob/master/l...

Rendered with docco: http://josephg.github.com/node-browserchannel/docs/server.ht...

The comments in this file also describe the spec of what I'm implementing and give running commentary on the implemention. It took me awhile to get used to writing like this, but I really like the result. When I want to add a function or something I talk about what I want to achieve in a comment block before writing the code.

Somehow, the whole feeling of programming is different. By making the documentation the highest priority, I feel like I'm writing rather than coding. (I hate writing generally, but I really enjoyed it here.)

I'm not sure if it helps or hinders readability. Obviously there's way more information in there about my intensions. You can tell if surprising behaviour in the code is intended or if its a bug. But the code has become really long and you have to scroll heaps to be able to move through the file. I feel like my monitor has shrunk. Maybe I just need to split the code out into more files.

I'm not sure if I want to program like this all the time; but I'm writing a lot more inline documentation now. When you get stuck trying to decide how to implement something, talking about the design choices in a block comment has the same effect as explaining the problem to a coworker. Only this way you don't bother your coworkers and you have documentation on the choice when you're done.

Its also a fun experiment because of how weird it feels - I highly recommend giving it a go sometime.

cperciva · on Nov 1, 2011

I tend to add comments where my code risks to be not clear by itself

I used to take that position, but I've started adding more comments in order to avoid "mental stack overflows". Suppose I'm reading function A and trying to understand it, then I find a call to function B which doesn't have any comment explaining what it does; I then go look at function B, and find it has a call to function C which is equally lacking in commentary; and by the time I've read the code in function C to understand what it does and gone back to function B to understand what it's doing I've completely lost track of what I was looking at in function A.

Of course, if you already know what most of the code is doing you don't run into such stack overflows because whatever code you're looking at is probably only calling functions you already understand. But for people who are new to the code -- or people who haven't looked at it for a couple years and have forgotten most of the details -- I think asking people to read the code to figure out what a function does is too much of a bar to understanding.

simplify · on Nov 1, 2011

In my opinion, if you can't figure out what a function does by its name and its parameter names, it is a poorly named and thus poorly documented function.

I think function and variable naming is the one of the most important aspects of programming. Without good naming, you can easy double the amount of time it takes to edit and extend functionality.

cperciva · on Nov 1, 2011

There's a tradeoff in the naming of functions. If you have lots of functions named functionWhoseNameMakesItVeryClearExactlyWhatItDoesToItsParameters(), your code will end up being hard to read simply by virtue of the volume of text people need to read through. I prefer to have gist_of_functionality() with more details easily accessible in an obviously-placed comment.

mbreese · on Nov 1, 2011

A function may have the perfect name at the time you wrote it. It may make perfect sense within the context that you initially conceived of it. However, after some time away from it, when you're trying to mentally rebuild that context, it may make as much sense as def foo().

Good naming is important, but you also have to know the context, which is more difficult to remember.

ryanpetrich · on Nov 2, 2011

If it only makes sense within the context it was conceived under, that's not a perfect name. A function name should make sense for someone with understanding of the domain, but no understanding of the code. When that's not possible, comments definitely help.

sbov · on Nov 1, 2011

I don't understand how you can really have a function name that properly says what it does without it being overly long.

In particular, function names seem to focus more on expressing the postcondition but make no mention of preconditions, exceptions, error conditions - all which also need to be known when figuring out what a function does.

In other words, strlen becomes strlen_segfaultifstrnull. You could theoretically encode that in the parameter name, but it becomes even messier when you have functions that raise exceptions. And now instead of long function names, all your functions are using long variable names that serve no purpose other than documenting said error conditions.

mikeyk · on Nov 1, 2011

Honestly, it was mostly the concision that struck me at the time. Looking at the zipmap source now, I think it's more or less where my code tends to end up in terms of comments--some motivation / higher-level comments at the top, and most functions have a comment before, hopefully with some explanation of any edge/NULL cases as well. Any open-source projects you'd point to as good examples of what looks readable to you? Always trying to improve my own coding habits as well.

cperciva · on Nov 1, 2011

I like to think that most of my recent code is pretty readable. The largest chunk of open source is my kivaloo data store (http://www.tarsnap.com/kivaloo.html, browsable svn repository at http://code.google.com/p/kivaloo/source/browse/).

qeorge · on Nov 1, 2011

Very readable for something so dense. Nice work!

One suggestion from the peanut gallery:

In http://code.google.com/p/kivaloo/source/browse/trunk/lib/dat..., at line 178 you have:

    if (rehash(H))
        goto err0;

That's because rehash returns 0 upon success, and -1 on failure.

To me, that's very confusing, because it reads to be erroring upon success. When I read rehash's body, I learn that 0 means success and -1 means failure.

I'd prefer one of two methods. First, you could return 1 on success, and keep returning something falsy on failure. Then the code would read:

    if (!rehash(H))
        goto err0;

Which I think is more readable. Perhaps even better would be to use a constant, i.e.,

    if (rehash(H) != REHASH_SUCCESS)
        goto err0;

although that can easily lead to a mess of redundant constants, or a headache managing them.

Just my personal preference. Really nice code, thank you for sharing it.

cperciva · on Nov 1, 2011

Very readable for something so dense. Nice work!

Thanks!

you could return 1 on success, and keep returning something falsy on failure

I come from an OS background, so to me 0 is success and non-zero is failure. It doesn't really matter which convention a project uses as long as it's consistent, so I documented this in my /STYLE file: "In general, functions should return (int)(-1) or NULL to indicate error."

nkurz · on Nov 2, 2011

Both you and Salvatore have very readable code. I'm uncomfortable with having both 'H' and 'h' in the same function, and with using 'l' as a variable, but both of these files are exemplary C. Still, although it may just be personal preference, I give the slight edge to Salvatore on clarity.

I have no real problem with 0 for success, but I'm with 'qeorge' on this. How is someone looking at only this function call supposed to determine if rehash() returns an integer or a pointer? My first guess would have been that you were checking for a non-NULL pointer. The 'goto err' makes it relatively clear after the fact, but not at a glance.

If you want to keep returning -1, I think a better convention would be 'if (rehash(H) < 0)', which makes it more clear that you expect an int and better implies an error condition. But I think even better would be to return KV_SUCCESS or KV_FAILURE (defined however you choose) and check explicitly. This would also let you remove a bunch of lines by getting rid of every comment next to every return statement! :)

cperciva · on Nov 2, 2011

I'm uncomfortable with having both 'H' and 'h' in the same function

To be honest, I don't like that either, and normally wouldn't do it. In this case I decided that the benefits of consistency with other code ([H]ash table structure; [h]ash of some data) outweighed the ugly near-collision.

How is someone looking at only this function call supposed to determine if rehash() returns an integer or a pointer?

If it returned a pointer, that line would have been "if ((foo = rehash(H)) == NULL)" -- I don't return pointers which aren't going to be used, and I don't write "if (pointer)". Again, it's a matter of knowing the house style.

I think even better would be to return KV_SUCCESS or KV_FAILURE (defined however you choose) and check explicitly

If it was just a matter of one function, that might be reasonable. But applying that to the entire project I'd have LBS_STORAGE_SUCCESS, LBS_WORKER_SUCCESS, LBS_DISK_SUCCESS, LBS_DISPATCH_SUCCESS, KVLDS_DISPATCH_SUCCESS, KVLDS_SERIALIZE_SUCCESS, BTREE_FIND_SUCCESS, BTREE_MUTATE_SUCCESS, BTREE_SUCCESS, BTREE_CLEANING_SUCCESS, BTREE_NODE_SUCCESS, MUX_DISPATCH_SUCCESS, NETBUF_SUCCESS, PROTO_LBS_SUCCESS, PROTO_KVLDS_SUCCESS, EVENTS_SUCCESS, WIRE_READPACKET_SUCCESS, WIRE_WRITEPACKET_SUCCESS, WIRE_REQUESTQUEUE_SUCCESS, UTIL_SUCCESS, ELASTICQUEUE_SUCCESS, PTRHEAP_SUCCESS, SEQPTRMAP_SUCCESS, and ELASTICARAY_SUCCESS. Plus all the _FAILUREs.

Much simpler to just say "0 is success, -1 is failure" once.

nkurz · on Nov 2, 2011

That's fair, although it seems like in that case TARSNAP_FAILURE and TARSNAP_SUCCESS would then be a fine alternative.

I think the whole question is how much it's OK to have an house style that isn't immediately accessible, and how much that benefits the project. Is it really a benefit if outsiders can grasp the code immediately, or is it actually good to have a hurdle?

Do you have an argument against "rehash(H) < 0" other than the visual distraction? It seems like it would parallel better with a literal comparison to NULL.

cperciva · on Nov 2, 2011

TARSNAP_FAILURE and TARSNAP_SUCCESS would then be a fine alternative

Sure, until someone looking at kivaloo or spiped or scrypt asks "what the heck is tarsnap?" ;-)

house style that isn't immediately accessible

I think it's very likely that people working on the kivaloo code will have at least a passing familiarity with UNIX system call conventions, and would find my house style entirely accessible.

Do you have an argument against "rehash(H) < 0" other than the visual distraction?

I would interpret that to indicate that the function has several potential return codes, not just 0 or -1.

qeorge · on Nov 2, 2011

Gotcha. I'm not an OS guy, and didn't know that convention. Thanks!

Dumb question though:

If rehash can only return -1 or 0, won't

   if(rehash(H))

always fail?

cperciva · on Nov 2, 2011

Dumb question though...

There's no such thing as a dumb question, only people too dumb to take every available opportunity to learn. ;-)

In C, "if (foo)" means "if (foo != 0)" if foo has integer type, so that line means "if (rehash(H) != 0)" or (since rehash only returns 0 or -1) equivalently "if (rehash(H) == -1)".

qeorge · on Nov 2, 2011

Thanks so much Colin, I'm learning a lot today. :)

ralph · on Nov 3, 2011

The reason for the convention is that there's often only one form of success, so 0 suffices, but there may be many causes of failure.

Retric · on Nov 1, 2011

I wonder how individualistic that really is.

I like well documented code, I don't trust that documentation to be accurate so I don't include it in my readability score. I like to have around 10 to 40 lines per function, reasonable levels of reuses, a reasonable overall structure and function depth to stay reasonable aka f1 call f2... calls f10 is fine, f1 calls f2... calls f40 smells bad. f1 calls f2 calls f1 is fine though. And stay away from the more colorful parts of the language.

How about you?

cperciva · on Nov 1, 2011

I generally don't trust external documentation, but I do trust in-line comments.

catwell · on Nov 2, 2011

You're probably wrong. Look at the comment above "stretches" here: https://github.com/binarylogic/authlogic/blob/master/lib/aut...

This is crypto-related code in of the most popular Rails authentication gems. And this is one among many examples. I'm sure you've already come across something like that more than once.

I trust code rather than comments. At least it doesn't lie.

vijaydev · on Nov 2, 2011

Fixed :-) https://github.com/binarylogic/authlogic/commit/f52a699bbc9e...

catwell · on Nov 3, 2011

Incorrectly, see my comment on your link. But thanks for trying :) If you want my PoV on this, the right way to fix it is delete the comment.