Honestly, it was mostly the concision that struck me at the time. Looking at the...

cperciva · on Nov 1, 2011

I like to think that most of my recent code is pretty readable. The largest chunk of open source is my kivaloo data store (http://www.tarsnap.com/kivaloo.html, browsable svn repository at http://code.google.com/p/kivaloo/source/browse/).

qeorge · on Nov 1, 2011

Very readable for something so dense. Nice work!

One suggestion from the peanut gallery:

In http://code.google.com/p/kivaloo/source/browse/trunk/lib/dat..., at line 178 you have:

    if (rehash(H))
        goto err0;

That's because rehash returns 0 upon success, and -1 on failure.

To me, that's very confusing, because it reads to be erroring upon success. When I read rehash's body, I learn that 0 means success and -1 means failure.

I'd prefer one of two methods. First, you could return 1 on success, and keep returning something falsy on failure. Then the code would read:

    if (!rehash(H))
        goto err0;

Which I think is more readable. Perhaps even better would be to use a constant, i.e.,

    if (rehash(H) != REHASH_SUCCESS)
        goto err0;

although that can easily lead to a mess of redundant constants, or a headache managing them.

Just my personal preference. Really nice code, thank you for sharing it.

cperciva · on Nov 1, 2011

Very readable for something so dense. Nice work!

Thanks!

you could return 1 on success, and keep returning something falsy on failure

I come from an OS background, so to me 0 is success and non-zero is failure. It doesn't really matter which convention a project uses as long as it's consistent, so I documented this in my /STYLE file: "In general, functions should return (int)(-1) or NULL to indicate error."

nkurz · on Nov 2, 2011

Both you and Salvatore have very readable code. I'm uncomfortable with having both 'H' and 'h' in the same function, and with using 'l' as a variable, but both of these files are exemplary C. Still, although it may just be personal preference, I give the slight edge to Salvatore on clarity.

I have no real problem with 0 for success, but I'm with 'qeorge' on this. How is someone looking at only this function call supposed to determine if rehash() returns an integer or a pointer? My first guess would have been that you were checking for a non-NULL pointer. The 'goto err' makes it relatively clear after the fact, but not at a glance.

If you want to keep returning -1, I think a better convention would be 'if (rehash(H) < 0)', which makes it more clear that you expect an int and better implies an error condition. But I think even better would be to return KV_SUCCESS or KV_FAILURE (defined however you choose) and check explicitly. This would also let you remove a bunch of lines by getting rid of every comment next to every return statement! :)

cperciva · on Nov 2, 2011

I'm uncomfortable with having both 'H' and 'h' in the same function

To be honest, I don't like that either, and normally wouldn't do it. In this case I decided that the benefits of consistency with other code ([H]ash table structure; [h]ash of some data) outweighed the ugly near-collision.

How is someone looking at only this function call supposed to determine if rehash() returns an integer or a pointer?

If it returned a pointer, that line would have been "if ((foo = rehash(H)) == NULL)" -- I don't return pointers which aren't going to be used, and I don't write "if (pointer)". Again, it's a matter of knowing the house style.

I think even better would be to return KV_SUCCESS or KV_FAILURE (defined however you choose) and check explicitly

If it was just a matter of one function, that might be reasonable. But applying that to the entire project I'd have LBS_STORAGE_SUCCESS, LBS_WORKER_SUCCESS, LBS_DISK_SUCCESS, LBS_DISPATCH_SUCCESS, KVLDS_DISPATCH_SUCCESS, KVLDS_SERIALIZE_SUCCESS, BTREE_FIND_SUCCESS, BTREE_MUTATE_SUCCESS, BTREE_SUCCESS, BTREE_CLEANING_SUCCESS, BTREE_NODE_SUCCESS, MUX_DISPATCH_SUCCESS, NETBUF_SUCCESS, PROTO_LBS_SUCCESS, PROTO_KVLDS_SUCCESS, EVENTS_SUCCESS, WIRE_READPACKET_SUCCESS, WIRE_WRITEPACKET_SUCCESS, WIRE_REQUESTQUEUE_SUCCESS, UTIL_SUCCESS, ELASTICQUEUE_SUCCESS, PTRHEAP_SUCCESS, SEQPTRMAP_SUCCESS, and ELASTICARAY_SUCCESS. Plus all the _FAILUREs.

Much simpler to just say "0 is success, -1 is failure" once.

nkurz · on Nov 2, 2011

That's fair, although it seems like in that case TARSNAP_FAILURE and TARSNAP_SUCCESS would then be a fine alternative.

I think the whole question is how much it's OK to have an house style that isn't immediately accessible, and how much that benefits the project. Is it really a benefit if outsiders can grasp the code immediately, or is it actually good to have a hurdle?

Do you have an argument against "rehash(H) < 0" other than the visual distraction? It seems like it would parallel better with a literal comparison to NULL.

cperciva · on Nov 2, 2011

TARSNAP_FAILURE and TARSNAP_SUCCESS would then be a fine alternative

Sure, until someone looking at kivaloo or spiped or scrypt asks "what the heck is tarsnap?" ;-)

house style that isn't immediately accessible

I think it's very likely that people working on the kivaloo code will have at least a passing familiarity with UNIX system call conventions, and would find my house style entirely accessible.

Do you have an argument against "rehash(H) < 0" other than the visual distraction?

I would interpret that to indicate that the function has several potential return codes, not just 0 or -1.

qeorge · on Nov 2, 2011

Gotcha. I'm not an OS guy, and didn't know that convention. Thanks!

Dumb question though:

If rehash can only return -1 or 0, won't

   if(rehash(H))

always fail?

cperciva · on Nov 2, 2011

Dumb question though...

There's no such thing as a dumb question, only people too dumb to take every available opportunity to learn. ;-)

In C, "if (foo)" means "if (foo != 0)" if foo has integer type, so that line means "if (rehash(H) != 0)" or (since rehash only returns 0 or -1) equivalently "if (rehash(H) == -1)".

qeorge · on Nov 2, 2011

Thanks so much Colin, I'm learning a lot today. :)

ralph · on Nov 3, 2011

The reason for the convention is that there's often only one form of success, so 0 suffices, but there may be many causes of failure.