While agreeing with the results of this article, I’ve found that convincing othe...

naasking · 2024-04-02T12:41:06.000000Z

The problem with your hierarchy is that there's no empirical evidence supporting it. Small unit tests have not empirically been shown to have benefits over integration tests, and test-driven design has failed to show a benefit over tests written after the fact. The only thing that seems to matter is that tests are written, and the more tests, the more chances of finding a bug. That's it. So your list is actually:

* integration and unit tests: since these are manually written, they scale poorly but are simple.

* property tests: since these are semi-automatic they scale better but are a bit more complicated to setup.

* fuzzing: almost fully automatic, although I don't differentiate this much from property-based testing.

* mutation based testing

pfdietz · 2024-04-02T15:12:05.000000Z

Is mutation based testing only better because it forces more tests to be written to kill the mutants?

Also, mutation based testing is really orthogonal to the others, since it's a way of evaluating the adequacy of the tests, not of actually testing. One could easily imagine using PBT/fuzzing to generate (and then simplify) tests with the express goal of killing mutants.

eru · 2024-04-03T03:16:57.000000Z

> Small unit tests have not empirically been shown to have benefits over integration tests, [...]

Do you have any links to these studies?

> * property tests: since these are semi-automatic they scale better but are a bit more complicated to setup.

Once you are in the groove, I find property based tests to be simpler (or not harder) than example based tests. But that's when I am writing tests as I am developing the system, ie I take testability into account in the design.

stouset · 2024-04-02T15:08:18.000000Z

> code is written to be testable first, maybe with lots of mocks

If you mean what I think I mean, this is the bottom rung of the ladder. Code that is only testable with lots of mocks is in practice worse than code with no tests.

Tests should do two things: catch undiscovered bugs and enable refactoring. Test mocked to high heaven do one thing: confirm that the code is written the way it’s currently written. This is diametrically opposed to and completely incompatible with those two stated goals. Most importantly, the code can’t be changed without breaking and rewriting tests.

Mocks are okay for modeling owners of external state. Even better are dummy/fake implementations that look and behave like the real thing, (but with highly simplified logic).

blowski · 2024-04-02T09:00:43.000000Z

I really like this list, and it's a great idea to explain testing this way.

Perhaps there is also a level -1, when the tests actually make things worse. I see this when tests are extremely brittle, flaky, don't test the most complex or valuable bits of code, are very slow to run, or unmaintained with a list of "tests we know fail but haven't fixed".

Lutger · 2024-04-02T10:01:30.000000Z

There might even be a whole ladder going further down. You, something about the value of everything and the cost of nothing.

I've seen tests with bugs in them, hiding bugs in the code and giving a false sense of robustness. because, you know, code coverage is 100% so we cannot have any bugs, right?

I had to work with tests that test many trivial things, tightly coupled with implementation details. Those would discourage small refactorings, because it took a lot of time to understand and fix the failed tests as changes in details of implementation would lead to failures in unittests. It also slowed down the pace of development for no good reason.

Unless a test has a value that exceeds the cost, it is a net negative.

pydry · 2024-04-02T09:21:24.000000Z

>- code can only be tested with an integration test

Some code only makes sense to test with integration tests. It's not more effective with a code base where somebody has decided to fatten up the SLOC with dependency inversion just so that they can write some unit tests which test that x = x.

>- code is tested by comparing the stdout with an earlier version (hopefully it’s deterministic!)

Making the code deterministic or adapting the tests to accommodate that it isn't should be next on the ladder, not hoping that it is.

jcgrillo · 2024-04-02T14:41:27.000000Z

> coming up with good properties is not always trivial

This is difficult, but one technique that (might) make it easier for real-world applications beyond simple invariants is to take the approach of building a simple model of the system under test and, in the PBT, checking that your system's behavior matches the model's [1].

[1] https://dl.acm.org/doi/10.1145/3477132.3483540

eru · 2024-04-03T03:19:44.000000Z

Testing behaviour against an 'oracle' is a great class of properties to check.

Especially useful when you want to test an optimized version against a simpler (but slower) baseline version. Or when you have a class of special cases, that you can solve in a simpler way.

Testing the system against itself, but under symmetry, is also useful. But that goes close to general properties. A symmetry could be flipping labels around, or shuffling input order etc; it depends on your system.

boxed · 2024-04-02T08:04:42.000000Z

I don't understand why PBT is above mutation testing. It seems like it's more of a popularity contest kind of thing, and not a matter of engineering tradeoffs or how useful it is.

MrJohz · 2024-04-02T08:21:54.000000Z

In my experience, adding PBT once you've got a codebase amenable to unit testing in general is a relatively easy step: you can add property tests to your existing unit tests without changing much about your setup, and assuming your tests in general are quick, PBT won't add much overall time to your testing process.

But adding mutation testing tends to be harder: it's not just an extra thing you can add in without changing any existing code, it's testing whether the tests are useful in the first place. Which means when you introduce it, you'll probably need to spend some time fixing everything that's currently wrong. This makes it a next step in the testing process beyond just adding a new technique to your existing repertoire.

That said, I've used PBT for a while and not had as much success with mutation testing, so maybe this is a personal bias.

boxed · 2024-04-02T09:02:24.000000Z

The setup for MT should be none. You just start it and see what you get.

> But adding mutation testing tends to be harder: it's not just an extra thing you can add in without changing any existing code

But it is...

> it's testing whether the tests are useful in the first place.

Not useful. Complete. MT checks that you have tests for all the behavior of the code.

MrJohz · 2024-04-02T10:32:27.000000Z

Maybe we're talking about different things then, but in my experience, MT is a reasonably involved procedure that requires configuring the MT harness to understand where the code lives, how to run the test suite, how to interpret the test suite, etc, then running the mutation tests for upwards of an hour as it repeatedly makes various changes and runs the tests. If you want to do something more complicated like only test a given region of the codebase, then the configuration becomes even more involved.

This is all significantly more involved than my experience with PBT, which tends to be something that can be added without much ceremony to an existing test suite when it makes sense.

To be clear, I love the idea behind mutation testing and I have given it a go a few times with limited success, but I think your comment is overselling its simplicity.

That said, I'd love your advice: how do you introduce mutation testing to a large codebase that currently has an extensive set of tests but hasn't used mutation tests yet? And how do you maintain the MT side of things? It seems far too slow to regularly run in CI: do you just run the MT tool every now and then to make sure that the tests are still covering all the mutations? Or do you have a more structured approach?

boxed · 2024-04-02T13:31:25.000000Z

> MT is a reasonably involved procedure that requires configuring the MT harness to understand where the code lives, how to run the test suite, how to interpret the test suite

mutmut autodetects this with a majority of setups, but yea, if you need to configure all that a lot then it can be annoying.

> then running the mutation tests for upwards of an hour as it repeatedly makes various changes and runs the tests

Yea, MT is slow heh. But it can be simple to start with. I generally recommend to do it for libraries mostly since they tend to have small and fast test suites which makes it much more fun. Or extract the code into a throw away project, do MT there and then move it back. It's a bit crap, but it works.

> To be clear, I love the idea behind mutation testing and I have given it a go a few times with limited success, but I think your comment is overselling its simplicity.

I had to write my own mutation tester because I couldn't get the existing ones to work, so I do feel your pain there :P

> That said, I'd love your advice: how do you introduce mutation testing to a large codebase that currently has an extensive set of tests but hasn't used mutation tests yet?

I party answered above, but I would only test extremely limited parts that are critical. And I would make sure to not run the entire test suite.

> And how do you maintain the MT side of things? It seems far too slow to regularly run in CI: do you just run the MT tool every now and then to make sure that the tests are still covering all the mutations? Or do you have a more structured approach?

This is exactly how I use it yes. People ask for CI support for mutmut and I've accepted PRs for it, but I just assume they will get it working and then later throw it all away as it's useless. I try to convince people it's the wrong approach but I have trouble getting them to listen.

If MT was WAY faster then maybe you could use it to validate it regularly, but for mutmut at least it's too slow. I have an experimental branch of mutmut that is much faster but I just don't have the time/interest to make that a reality right now. I don't particularly need MT in my current job...

jiehong · 2024-04-02T08:17:54.000000Z

PBT is below mutation testing, the list is ordered with lower tier up.

boxed · 2024-04-02T08:58:59.000000Z

Sorry, I put that badly. What I meant was that I don't understand why PBT is something you apply before MT.

eru · 2024-04-03T03:15:16.000000Z

> While agreeing with the results of this article, I’ve found that convincing other developers of writing test with properties isn’t very easy: coming up with good properties is not always trivial.

Yes, but there's some easy targets:

Your example based tests often have some values that are supposed not to matter. You can replace those with 'arbitrary' values from your property based testing library.

Another easy test that's surprisingly powerful: just chuck 'arbitrary' input at your functions, and check that they don't crash. (Or at least, only throw the expected errors.) You can refine what 'arbitrary' means.

The implied property you are testing is that the system doesn't crash. With a lot of asserts etc in your code, that's surprisingly powerful.

IshKebab · 2024-04-02T07:52:14.000000Z

I've never seen anyone do mutation testing with software (it's pretty common for hardware though). Does it require language level support?

boxed · 2024-04-02T08:02:46.000000Z

I'm the author of mutmut, the primary mutation tester for python, so I think I can speak a bit on this.

It's quite straight forward to do MT in software. I've done quite a bit of it for specific libraries that I've built (iommi, tri.struct, and internal code). A big advantage in my book of MT over PBT is the much lower cognitive overhead for MT and that you can know you have done it completely. The second may be just a false sense of security or an emotional blanket, but still.

I have written about mutation testing a few times: https://kodare.net/2019/04/10/mutation-vs-property-based-tes... https://kodare.net/2018/11/18/the-missing-mutant-a-performan... https://kodare.net/2016/12/12/mutation-testing-in-practice.h... and my talk on mutmut from pycon sweden is on youtube: https://www.youtube.com/watch?v=fZwB1gQBwnU

IshKebab · 2024-04-02T12:02:05.000000Z

Does that only work in Python though? What about compiled languages? You don't really want to have to recompile your whole project again for every line you change...

boxed · 2024-04-02T13:34:28.000000Z

mutmut right now is implemented by writing changes to disk and starting new processes. But you can implement it via "mutation schemata" where you functionally compile all possible mutants ahead of time, plus the original function, and replace the original function with a trampoline that either calls the original or one of the mutants depending on some external state.

I have a prototype of mutmut that does this and it's 10 to 100x faster. It does have the downside of not being able to mutate stuff like static global variables and such though.

Smaug123 · 2024-04-02T08:04:14.000000Z

[I've never used any mutation testing tools either.]

In general it doesn't require language-level support, of course - you can just make a change and rebuild it, à la Stryker https://stryker-mutator.io/docs/stryker-net/technical-refere... . PITest operates on JVM bytecode (https://pitest.org/) for orders of magnitude speedup.

IshKebab · 2024-04-02T18:14:48.000000Z

Yeah I meant language level support to make it viable, which rebuilding a gazillion times doesn't really sound like.

pfdietz · 2024-04-02T10:30:28.000000Z

Where does automatic failing test input reduction fall into that ladder? Or is that assumed as part of PBT.

troupo · 2024-04-02T10:15:54.000000Z

Nope. It's this disparaging of integration tests that gets us into hundreds of useless unit tests that in reality test nothing.

Integration tests must be much higher on the list.

Also: nothing is stopping you from running your prop tests in integration tests, too

agentultra · 2024-04-02T13:50:56.000000Z

Nothing stopping you. I don’t recommend it. Tests need to be fast or they don’t get written.

Adding property tests into integration tests will make things slower and flakier.

Nullabillity · 2024-04-02T16:47:21.000000Z

It depends. For a lot of applications, the integration is 99% of the app. Trying to unit test that just ends up testing whether you believe the other end of the integration behaves the way you believe it behaves. It's like having a test that verifies the hash of the executable binary. It tells you that something has changed, but not whether or not that change is desirable.

pydry · 2024-04-02T16:22:39.000000Z

Test speed fetishism is a bad idea. It leads people to write unrealistic tests which end up passing when there is a bug and failing when there isn't.

Treating flakiness in integration tests as a fait accompli is also a bad idea. It's a bad idea - a bit like treating flakiness in the app itself as a fait accompli.

agentultra · 2024-04-02T18:02:43.000000Z

Tests that are slow to run don't get run often. People don't write tests first. They write fewer tests. They tend to test the happy paths.

I'm not saying we should avoid integration testing. It's necessary.

It doesn't mix well with PBT for a main test suite, in my experience.

You're better off making sure your domain types and business logic are encapsulated in your types and functions and don't require a database, a network, etc. PBT is great at specifying invariants and relations between functions and sets.

Use your integration testing budget for testing the integration boundaries with the database and network.

Update: there's no cure for writing bad tests. you just have to write better ones.

pydry · 2024-04-02T20:09:01.000000Z

Tests get run as often as they get run. There's no rule that says that you have to run them less frequently if they are slower. It's not 1994. CPU time is cheap.

Realism is much more important than speed. A test that catches 20% more bugs and takes 5 seconds instead of 0.05s is just more valuable. It's simply not worth economizing on CPU cycles in 2024.

Yes, DI can make sense sometimes but definitely not all of the time.

The people who have ultra fast unrealistic tests inevitably end up leaning on manual testing as they experience bugs which their unit tests could never have caught.

>there's no cure for writing bad tests. you just have to write better ones.

Better means more realistic, hermetic and less flaky. It doesnt necessarily mean ultrafast.

troupo · 2024-04-02T15:02:39.000000Z

> will make things slower and flakier.

If running tests with different data makes them flaky, your system is bad.

Regardless of property based testing, I've seen too many systems where there were hundreds of unit tests and which wold fail for the most trivial reasons because the actual integration of those units was never tested properly.

agentultra · 2024-04-02T17:55:19.000000Z

I love PBT and use it frequently.

So integration tests are generally IO-bound. They'll be slow.

With PBT you will generate 100, 1000 slow cases.

Integration tests are flaky in the sense that your components might not generate just the right inputs to trigger an error. They may not be configured under test the same way as they are in production. And so on.

PBT can be flaky in the sense that, even with carefully selected distributions, you can get a test that ran fine for months to fail on a run. Add that on top of integration test flakiness.

There's no silver bullet for writing bad specifications.