>My final, and perhaps more important, advice is to always write regression tests. Encode every single bug you find as a test, to ensure that you’ll notice if you ever encounter it again.
This is good advice.
On a previous (technical-debt ridden) project I did a little measuring and there was a pretty clear hierarchy of test value - in terms of detected regressions:
1) Tests written to invoke bugs.
2) Tests written before implementing the feature which makes them pass.
3) Tests written to cover "surprise" features (i.e. features written by a previous team that I never noticed existed until they broke or I spotted evidence of them in the code).
4) Tests written after implementing the feature.
5) Tests written just for the sake of increasing coverage.
Surprisingly 5 actually ended up being counter-productive most of the time - those tests detected very few bugs but still had a maintenance and runtime overhead.
What do people think about writing explicit regression tests for bugs found through fuzz testing? I've tended to lean towards not writing them, depending on continuing fuzz testing to insure things stay clean. This may of course be somewhat naive, but I also fear cluttering my test suite with really obscure test cases.
I think the premise of your question confuses considerations, unless I've misunderstood.
On all projects I own, the policy is that a bug fix will not be merged into a codebase without comprehensive unit testing demonstrating the case in which that bug was discovered, and that it has been resolved.
I do not understand why it matters _how_ the bug was discovered. If fuzz testing discovered that function foo throws tries to dereference a null pointer given input "ABABAB", then I would expect the engineer who chose to address that bug to investigate what property of "ABABAB" is the unaccounted for property, account for it, and then write a unit test calling foo with input "ABABAB", along with several other inputs that share the same discovered underlying property.
Fuzz testing may be a different method of testing, but the end result is, regardless, that you have discovered an input that your application hasn't been designed to handle properly and that needs to be demonstrably fixed, whatever it may be in particular.
Wouldn't you want to write those explicit tests anyway, to run the troublesome input in isolation while fixing the bug? With the tooling to fuzz, they should be one-liners or close, hardly cluttering. One time, working on an extremely fuzz-friendly function that was crazy rich in corner cases I even made the error message of the fuzz loop include the one-liners that would execute the failing inputs, ready for copy&paste. Testing never felt more productive.
I actually don't think that heavy fuzzing has a place in an automated test suite at all. Test suites should be fast and 100% reproducible at all times. Then explicit regression tests for the discovered cases are the only way. (I do occasionally allow myself to include short fuzz loops with fixed RNG initialization, but those are more on the "shameful secrets" end of the spectrum)
Haven't worked with fuzz testing myself, but it sounds like something I'd lean towards writing. Those obscure test cases are exactly the thing you don't tend to find in manual testing, and AFAIK fuzz testing is random enough that you can't be sure that every run will exercise the same bug.
One benefit I find with code coverage as a goal is that it highlights unused or unnecessary branches, and can encourage me to think about how my code is organized such that code is more easily accessible.
Put another way, those branches that are rarely touched or hard to get to can become a surprise when they are actually reached in some unique situation.
I guess in this case it isn't really the test itself that is useful but instead the requirement to at least hit each branch can cause me to design and organize the code better.
Code coverage reports are a visualization tool that should be used to quickly get a lay of the land. Code coverage itself should not be the goal; the goal should be careful coverage of behavior that is composed of discrete units amenable to unit tests.
I once made the mistake, as a new lead, of implementing a 100% code coverage policy; I thought that I was expressing the intent of covering all possible behavior. What ended up happening is that the team focused on the metric and lost sight of the goal of unit testing, which is to test behavior. We ended up with people submitting PRs containing unit tests for object getters and setters but not testing that trying to set a null value is properly handled.
That experience taught me is that code coverage is only a tool, and a tool is only useful if used correctly.
To drive the point home: having 100% coverage of all branches means that all written code is tested, but it does not mean that all code that needs to be written has been written. Unit tests should verify behavior, not just execute whatever code has been written.
Yeah I agree, once a metric itself becomes the goal then you end up with some unmaintainable tests just to hit a block. I guess I would prefer to see un-hit blocks eliminated, not obtuse tests written to try and hit them. But like you said, I can see how if test coverage is the only end goal, you could end up with some pretty useless tests just to hit some random block.
This is good advice.
On a previous (technical-debt ridden) project I did a little measuring and there was a pretty clear hierarchy of test value - in terms of detected regressions:
1) Tests written to invoke bugs.
2) Tests written before implementing the feature which makes them pass.
3) Tests written to cover "surprise" features (i.e. features written by a previous team that I never noticed existed until they broke or I spotted evidence of them in the code).
4) Tests written after implementing the feature.
5) Tests written just for the sake of increasing coverage.
Surprisingly 5 actually ended up being counter-productive most of the time - those tests detected very few bugs but still had a maintenance and runtime overhead.