I don't even like TDD much, but I think that this missed the point:
> Have you ever refactored working code into working code and had a slew of tests fail anyway?
Yes - and that is intended. The "refactor of working code into working code" often changes some assumptions that were made during implementation.
Those tests are not there to give "feedback on your design", they are there to endure that the implementation does what you thought it should do when you wrote your code. Yes, that means that when you refactor your code, quite a few tests will have to be changed to match the new code.
But the amount of times I had this happen and it highlighted issues on the refactor is definitely not negligible. The cost of not having these tests (which would translate into bugs) would certainly have surpassed the costs of keeping those tests around.
If we’re talking “what you thought it should do” and not “how you thought it should do it” this is all fine. If requirements change tests should change. I think the objection is more to changing implementation details and having to rewrite twice as much code, when your functional tests (which test things that actually make you money) never changed.
Maybe, but I think the point is that it's probably very easy to get into this situation, and not many people talk about it or point out how to avoid it.
I’m still not following what the issue is. If you refactor some code and change the behaviour of the code, and the code tests the expected behaviour and passes, then you have one of two problems:
1. You had a bug you didn’t know about and your test was invalid (in which case the test is useless! Fix the issue then you fix the test…)
or
2. You had no bug and you just introduced a new one, in which case the test has done its job and alerted you to the problem so you can fix your mistake.
What is the exact problem?
Now if this is an issue with changing the behaviour of the system, that’s not a refactor. In that case, your tests are testing old behaviour, and yes, they are going to have to be changed.
The point is that you're not changing the interface to the system, but you're changing implementation details that don't affect the interface semantics. TDD does lead you to a sort of coupling to implementation details, which results in breaking a lot of unit tests if you change those implementation details. What this yields is either hesitancy to undertake positive refactorings because you have to either update all of those tests or just delete them altogether, so were those tests really useful to begin with? The point is that it's apparently wasted work and possibly an active impediment to positive change, and I haven't seen much discussion around avoiding this outcome, or what to do about it.
There has been discussion about this more than a decade ago by people like Dan North and Liz Keogh. I think it’s widely accepted that strict TDD can reduce agility when projects face a lot of uncertainty and flux (both at the requirements and implementation levels). I will maintain that functional and integration tests are more effective than low-level unit tests in most cases, because they’re more likely to test things customers care about directly, and are less volatile than implementation-level specifics. But there’s no free lunch, all we’re ever trying to do is get value for our investment of time and reduce what risks we can. Sometimes you’ll work on projects where you build low level capabilities that are very valuable, and the actual requirements vary wildly as stakeholders navigate uncertainty. In those cases you’re glad to have solid foundations even if everything above is quite wobbly. Time, change and uncertainty are part of your domain and you have to reason about them the same as everything else.
> I will maintain that functional and integration tests are more effective than low-level unit tests in most cases
Right, that's pretty much the only advice I've seen that makes sense. The only possible issue is that these tests may have a broader state space so you may not be able to exhaustively test all cases.
Absolutely right. If you’re lucky, those are areas where you can capture the complexity in some sort of policy or calculator class and use property based testing to cover as much as possible - that’s a level of unit testing I’m definitely on board with. Sometimes it’s enough to just trust that your functional tests react appropriately to different _types_ of output from those classes (mocked) without having to drive every possible case (as you might have seen done in tabular test cases). For example I have an app that tests various ways of fetching and visualising data, and one output is via k-means clustering. I test that the right number of clusters gets displayed but I would never test the correctness of the actual clustering at that level. Treat complexity the same way you treat external dependencies, as something to be contained carefully.
Why does testing behavior matter? I don’t care if my tests exhaustively test each if branch of the code to make sure that they call the correct function when entering that if branch. That’s inane.
I care about whether the code is correct. A more concrete example; say I’m testing a float to string function, I don’t care how it converts the floating point binary value 1.23 into the string representation of “1.23”. All I care about, is the fact that it correctly turns that binary value into the correct string. I also care about the edge cases. Does 0.1E-20 correctly use scientific notation? What about rounding behavior? Is this converter intended to represent binary numbers in a perfect precision or is precision loss ok?
If your tests simply check that you call the log function and the power function x times, your tests are crap. And this is what I believe the parent commenter was talking about. All too often, tests are written to fulfill arbitrary code coverage requirements or to obsequiously adhere to a paradigm like TDD. These are bad tests, because they’ll break when you refactor code.
One last example, I recently wrote a code syntax highlighter. I had dozens of test cases that essentially tested the system end to end and made sure if I parsed a code block, I ended up with a tree of styles that looked a certain way. I recently had to refactor it to accommodate some new rules, and it was painless and easy. I could try stuff out, run my tests, and very quickly validate that my changes did not break prior correct behavior. This is probably the best value of testing that I’ve ever received so far in my coding career.
"Have you ever reconsidered your path up the cliff face and had to reposition a slew of pitons? This means your pitons are too tightly coupled to the route!"
> Have you ever refactored working code into working code and had a slew of tests fail anyway?
Yes - and that is intended. The "refactor of working code into working code" often changes some assumptions that were made during implementation.
Those tests are not there to give "feedback on your design", they are there to endure that the implementation does what you thought it should do when you wrote your code. Yes, that means that when you refactor your code, quite a few tests will have to be changed to match the new code.
But the amount of times I had this happen and it highlighted issues on the refactor is definitely not negligible. The cost of not having these tests (which would translate into bugs) would certainly have surpassed the costs of keeping those tests around.