For fun, I'm going to write my thoughts before reading what you said about it el...

6ren · on June 8, 2011

> After you do that for a while, your program stops evolving as an expression of the problem being solved, because you've built it out of primitives that refer only to the internals of the program rather than to concepts drawn from the problem space.

Wow, nicely said.

You also remind me of the problem of removing "accidental duplication", or overfitting: this is when you factor out common code, but it turns out later that it's not really common - lots of minor and sometimes major distinctions occur as you implement more of the problem. It was only by accident that the code happened to be the identical at that stage of development. The theory constructed (the factoring out) gave too much weight to limited information (an early stage of the program), overfitting to that specific information. Generalizing from two instances is almost as bad generalizing from one. In your terms, it models the program not the problem.

It's so refreshing to hear similar thoughts to mine. :)

kragen · on June 13, 2011

http://lists.canonical.org/pipermail/kragen-hacks/2011-June/... has an example of this "eliminating accidental duplication", I think. The first version of the program treats "note handlers", the end-users of the "notes" it distributes, as simply one kind of peer. But the later versions have a more complicated protocol between peers, and so they reify "note handlers" as a separate kind of thing. Similarly, the first two versions invoke the user-visible .publish() method when they receive a note from a peer, but the third version has factored out a new .got_note() method to factor out the duplication between the two code paths.

As I was writing this code, I was thinking about Uncle Bob's extremist viewpoint on short methods in Clean Code, and I tried it out. In the end, I inlined all the methods that had only a single caller, except for the handle_message family. I think the code came out exceptionally clear and simple, but Uncle Bob would not be happy with it.

kragen · on June 13, 2011

The empirical studies you're mentioning were of FORTRAN functions, I think from a linear algebra library. They may not generalize.

My thought is that short methods make your code more flexible — that is, you can compose the pieces of it in more ways, so the next bit of code you write without modifying the existing code can be shorter — at the cost of comprehensibility and verifiability. It's no surprise that this value came out of the Smalltalk camp, because Smalltalk (and OO in general, but especially Smalltalk) is optimized for flexibility at the expense of verifiability.

When you factor out a method, you're making the code you pulled it out of easier to read — except when the reader needed to know the details of what you pulled out. But you're making the code you pulled out harder to read, because the reader no longer knows that it's called in only one place, what the state of the system is when it's called, what the values of its arguments are, and what its results are used for.

There was once a school of thought that it's easier to read a piece of code if it's laid out to visually show the tree structure of its loops and conditionals, and if it uses loops and conditionals instead of gotos. I think this is not the only virtue that code can possess that helps its readability, but it is a real virtue. Factoring out more methods reduces this virtue, so it needs to be repaid by some other virtue, which I think is what you're saying.

I have several different heuristics for when it's good to factor out methods or functions, but I think they aren't good enough, because I always end up with some functions that are kind of a mess.