I think about that email a lot at work (where we have some very long functions). I particularly like this excerpt, which is applicable outside of performance-critical situations:
> Besides awareness of the actual code being executed, inlining functions also has the benefit of not making it possible to call the function from other places. That sounds ridiculous, but there is a point to it. As a codebase grows over years of use, there will be lots of opportunities to take a shortcut and just call a function that does only the work you think needs to be done. There might be a FullUpdate() function that calls PartialUpdateA(), and PartialUpdateB(), but in some particular case you may realize (or think) that you only need to do PartialUpdateB(), and you are being efficient by avoiding the other work. Lots and lots of bugs stem from this. Most bugs are a result of the execution state not being exactly what you think it is.
in general I don't think it's worthwhile to split a long function into several static helpers just to get under an arbitrary maximum function length target. I don't think it leads to a net improvement in readability, since I now have to go back and forth between the helpers and the main function to see in what order the helpers get called (what if someone swaps the order of helperA and helperB but not the order of their definitions?). imo this is only worth doing if you're also willing to think long and hard about what happens if someone uses your helpers in a different context.
This exposes one of the weaknesses of C and C++. Nesting functions inside functions is actually a very useful thing to avoid precisely what's described, while still allowing code to be more readable. The inner functions can always be extracted later if deemed appropriate. So if C permitted it you could do something like:
So that function is only visible in this one scope, but where it's used it can have the effect of greatly improving readability (especially if it's called multiple times).
Now, C++ can halfway get there with private methods in classes. So anyone outside the class has to really try to break that encapsulation and access the function. C and C++ can get there by not exposing the functions in headers, so they remain file local.
But that doesn't prevent something like (within a file):
// should only be called from FullUpdate
void PartialUpdateA() {...}
void AnotherFunc() {
...
PartialUpdateA();
...
}
C++ lambdas do solve the problem, I'm not sure why I didn't include that.
However, unit testing them is not the only concern. There are reasons for using nested functions, class methods, file global, or program global functions.
If you make them global or methods, you lose control of how and when they're called. This can break invariants. So some functions can be hoisted up, but others oughtn't be (in particular, any pure function can be made global without any concern other than occupying a name, side effecting functions should be more carefully considered).
The interface to the functions may change if they capture any variables. If they capture nothing in the local scope, then hoisting them doesn't impact their interface. If they do capture something, hoisting them means adding parameters (complicating the interface) or making them observe variables either in the class (complicating the class) or file/program globals (bad practice).
Regarding unit testing. Nesting functions (or lambdas) are essentially a wash. You were, hopefully, testing the host function to begin with so nothing is changed if you use nesting functions as your first pass refactoring approach. You can then examine those functions and consider which should be moved out and why, and then add tests to any that have been pulled out of the host function.
C++ gives you enough access control so you can expose things just for testing if you want (make the test a friend, encapsulate the functions into another class). You can do this without breaking encapsulation.
If the functions are capturing a lot of variables, then I would try to reconsider the design. Usually things aren't irreducibly complex.
I don't exactly follow your last paragraph, but tests aren't just something you throw away when they pass. So if I were to write tests for the helper functions, I wouldn't delete those tests on a refactoring pass in order to use nested functions.
> C++ gives you enough access control so you can expose things just for testing if you want (make the test a friend, encapsulate the functions into another class). You can do this without breaking encapsulation.
If tests can do this, then so can anyone else. Consequently encapsulation is broken and your invariants aren't invariant anymore.
> I don't exactly follow your last paragraph, but tests aren't just something you throw away when they pass. So if I were to write tests for the helper functions, I wouldn't delete those tests on a refactoring pass in order to use nested functions.
I didn't write clearly because I didn't re-present the context of that paragraph.
I'm not talking about throwing away tests after they're run. Keep in mind my original post's context: manually inlined functions for access control to that functionality. You already can't test those separately because they aren't exposed. By moving to nested functions you regain some semblance of reasonability (versus 1k+ line functions with who knows how many levels of nested blocks) and the compiler can do the inlining (for performance). But it has zero net effect on testing, it's a wash. Because the public interface is the same (only the primary function interface is accessible to a tester).
If nested functions are available (and with C++ they are with lambdas) the refactoring would (or could) be something like: 1k line function => 500 line function with several lambdas => 3-8 functions totaling ~500 with some lambdas remaining.
Only those that make sense to move out for separate testing would be, and only if you also wanted to expose them for others to call.
If I had an image loader class, I can make a test function a friend, which would allow it to call private functions on the class. This only grants access to the test function (or test class). And there are stronger ways to hide things, like the PIMPL idiom, private headers, opaque pointers.
I find it interesting we're in such different schools of thought here.
> Only those that make sense to move out for separate testing would be, and only if you also wanted to expose them for others to call.
There are so many things that you might want to hide from an interface, yet still test. Imagine if you took that to an extreme and only tested the public interface of a library. I'm all for trying to independently test any bit of code that fills a screen.
I suppose you're not implementing safety-critical code then, as AV rule 1 [1] demands that "Any one function (or method) will contain no more than 200 logical source lines of code (LSLOCs)."
Personally, I just find it frustrating, if functions don't fit on the screen anymore (and I do use portrait mode already). Further, sub functions, when named appropriately (definitely not like helperA and helperB) can aid readability (as would comments about code blocks do, but who writes those and who maintains those?).
you are correct; I do not work on safety-critical code. I have worked with a lot of legacy code where functions depend on and mutate global state, often in fairly subtle ways. as much as possible, I want to see exactly which globals are being read/written and in what order. from this perspective, the "whole function" doesn't fit on one screen, whether or not sub-functions are factored out, and the name of a function cannot possibly tell me all I need to know about it.
> Besides awareness of the actual code being executed, inlining functions also has the benefit of not making it possible to call the function from other places. That sounds ridiculous, but there is a point to it. As a codebase grows over years of use, there will be lots of opportunities to take a shortcut and just call a function that does only the work you think needs to be done. There might be a FullUpdate() function that calls PartialUpdateA(), and PartialUpdateB(), but in some particular case you may realize (or think) that you only need to do PartialUpdateB(), and you are being efficient by avoiding the other work. Lots and lots of bugs stem from this. Most bugs are a result of the execution state not being exactly what you think it is.
in general I don't think it's worthwhile to split a long function into several static helpers just to get under an arbitrary maximum function length target. I don't think it leads to a net improvement in readability, since I now have to go back and forth between the helpers and the main function to see in what order the helpers get called (what if someone swaps the order of helperA and helperB but not the order of their definitions?). imo this is only worth doing if you're also willing to think long and hard about what happens if someone uses your helpers in a different context.