For fun, I'm going to write my thoughts before reading what you said about it elsewhere in the thread.
What do you think of short methods?
I'm skeptical of them. I think it's a mistake to try to make functions short for the sake of making them short. It's a mistake because adding a new function also adds complexity (i.e. more code, plus opacity between the calling and called) - not a lot, but greater than zero - so introducing a function is not cost-free and its benefit needs to be greater than its cost. I found that once I started asking functions to justify themselves this way, I began creating fewer functions and the overall complexity of my code went down.
Factoring code into functions is one of the best tools we have, of course, but people commonly make the mistake of applying it mechanically. A function should exist when the program itself wants that concept, not because you had a block of code that was too big or some duplication and you wanted to rearrange the pieces. The way to address those symptoms is not by adding more code but by thinking until you see how you were looking at the problem wrongly. Then the new concepts, and finally the new functions, appear by themselves.
You only have so many conceptual cards to play and must play them sparingly if you don't want your program to succumb to runaway complexity. A good function is a logical construct that makes sense in the meaning of the program the way a good word makes sense and adds a unique meaning to a language, something you can't quite say as well any other way.
When all you're doing is shifting pieces around, you're missing the most important thing about functions, which is this conceptual payload. After you do that for a while, your program stops evolving as an expression of the problem being solved, because you've built it out of primitives that refer only to the internals of the program rather than to concepts drawn from the problem space.
Side note. I'm writing at such length here and in the GP because these questions are on my mind all the time. I've been working on a hard problem for over two years now in an utterly immersed way, the kind where you dream about it every night, where time itself begins to blur. Our approach has been to evolve the program many times until it converges on a solution. The only way to do this is if the program doesn't grow as you evolve it. How do you build a system such that you're constantly adding new information and behavior to it, and yet the overall code doesn't grow? We've had to figure this out just to stay alive.
One more thing about function length - Steve McConnell cites studies that suggest that short functions aren't easier to understand. IIRC the sweet spot was between 50 and 100 lines, depending of course on the language. I've posted references to this on HN before. One should be careful about believing these studies because the empirical literature on software development is so poor. But it's at least interesting that such experimental evidence as exists runs counter to the "OO short methods" school.
> After you do that for a while, your program stops evolving as an expression of the problem being solved, because you've built it out of primitives that refer only to the internals of the program rather than to concepts drawn from the problem space.
Wow, nicely said.
You also remind me of the problem of removing "accidental duplication", or overfitting: this is when you factor out common code, but it turns out later that it's not really common - lots of minor and sometimes major distinctions occur as you implement more of the problem. It was only by accident that the code happened to be the identical at that stage of development. The theory constructed (the factoring out) gave too much weight to limited information (an early stage of the program), overfitting to that specific information. Generalizing from two instances is almost as bad generalizing from one. In your terms, it models the program not the problem.
It's so refreshing to hear similar thoughts to mine. :)
http://lists.canonical.org/pipermail/kragen-hacks/2011-June/... has an example of this "eliminating accidental duplication", I think. The first version of the program treats "note handlers", the end-users of the "notes" it distributes, as simply one kind of peer. But the later versions have a more complicated protocol between peers, and so they reify "note handlers" as a separate kind of thing. Similarly, the first two versions invoke the user-visible .publish() method when they receive a note from a peer, but the third version has factored out a new .got_note() method to factor out the duplication between the two code paths.
As I was writing this code, I was thinking about Uncle Bob's extremist viewpoint on short methods in Clean Code, and I tried it out. In the end, I inlined all the methods that had only a single caller, except for the handle_message family. I think the code came out exceptionally clear and simple, but Uncle Bob would not be happy with it.
The empirical studies you're mentioning were of FORTRAN functions, I think from a linear algebra library. They may not generalize.
My thought is that short methods make your code more flexible — that is, you can compose the pieces of it in more ways, so the next bit of code you write without modifying the existing code can be shorter — at the cost of comprehensibility and verifiability. It's no surprise that this value came out of the Smalltalk camp, because Smalltalk (and OO in general, but especially Smalltalk) is optimized for flexibility at the expense of verifiability.
When you factor out a method, you're making the code you pulled it out of easier to read — except when the reader needed to know the details of what you pulled out. But you're making the code you pulled out harder to read, because the reader no longer knows that it's called in only one place, what the state of the system is when it's called, what the values of its arguments are, and what its results are used for.
There was once a school of thought that it's easier to read a piece of code if it's laid out to visually show the tree structure of its loops and conditionals, and if it uses loops and conditionals instead of gotos. I think this is not the only virtue that code can possess that helps its readability, but it is a real virtue. Factoring out more methods reduces this virtue, so it needs to be repaid by some other virtue, which I think is what you're saying.
I have several different heuristics for when it's good to factor out methods or functions, but I think they aren't good enough, because I always end up with some functions that are kind of a mess.
What do you think of short methods?
I'm skeptical of them. I think it's a mistake to try to make functions short for the sake of making them short. It's a mistake because adding a new function also adds complexity (i.e. more code, plus opacity between the calling and called) - not a lot, but greater than zero - so introducing a function is not cost-free and its benefit needs to be greater than its cost. I found that once I started asking functions to justify themselves this way, I began creating fewer functions and the overall complexity of my code went down.
Factoring code into functions is one of the best tools we have, of course, but people commonly make the mistake of applying it mechanically. A function should exist when the program itself wants that concept, not because you had a block of code that was too big or some duplication and you wanted to rearrange the pieces. The way to address those symptoms is not by adding more code but by thinking until you see how you were looking at the problem wrongly. Then the new concepts, and finally the new functions, appear by themselves.
You only have so many conceptual cards to play and must play them sparingly if you don't want your program to succumb to runaway complexity. A good function is a logical construct that makes sense in the meaning of the program the way a good word makes sense and adds a unique meaning to a language, something you can't quite say as well any other way.
When all you're doing is shifting pieces around, you're missing the most important thing about functions, which is this conceptual payload. After you do that for a while, your program stops evolving as an expression of the problem being solved, because you've built it out of primitives that refer only to the internals of the program rather than to concepts drawn from the problem space.
Side note. I'm writing at such length here and in the GP because these questions are on my mind all the time. I've been working on a hard problem for over two years now in an utterly immersed way, the kind where you dream about it every night, where time itself begins to blur. Our approach has been to evolve the program many times until it converges on a solution. The only way to do this is if the program doesn't grow as you evolve it. How do you build a system such that you're constantly adding new information and behavior to it, and yet the overall code doesn't grow? We've had to figure this out just to stay alive.
One more thing about function length - Steve McConnell cites studies that suggest that short functions aren't easier to understand. IIRC the sweet spot was between 50 and 100 lines, depending of course on the language. I've posted references to this on HN before. One should be careful about believing these studies because the empirical literature on software development is so poor. But it's at least interesting that such experimental evidence as exists runs counter to the "OO short methods" school.