I like this joke from a while ago about testing "QA Engineer walks into a bar and he orders a beer. Orders 0 beers. Orders 99999999999 beers. Orders a lizard. Orders -1 beers. Orders a ueicbksjdhd. First real customer walks in and asks where the bathroom is. The bar bursts into flames, killing everyone."
Which leads into being very clear about what you want to achieve with testing, correctness, robustness, fit for purpose etc, and be clear how much effort you want to put into each area. Often I have seen people put a lot of effort into testing things that really doesn't have much payback compared to what they could be testing instead. So be careful of the opportunity costs of your testing efforts.
> Often I have seen people put a lot of effort into testing things that really doesn't have much payback compared to what they could be testing instead.
90% of the "unit tests" that I've observed in the wild are checking for things that a modern type system would easily prevent you from doing.
The "unit testing is a magic bullet" cults seem form in environments that use weakly typed or highly dynamic languages like Javascript that let you pass anything to anything and only blow up when you execute one particular branch at runtime.
A good reason to use Typescript, Rust, Python's optional type hints, etc. is that they point out these problems for you as you're writing your code, so you don't have to unravel your mess three days later as you're cranking out 10 pages of boilerplate unit tests that only cover two functions and aren't even close to being exhaustive.
Use better languages, stop wasting your time testing for typos and brain farts, and focus on testing higher-level aspects of your design that your language and tools can't possibly know about.
> only blow up when you execute one particular branch at runtime
On a related note: I'm a big fan of property-based testing, where each test is basically a function 'args => boolean', and we're asserting that it always returns true regardless of the given arguments (in reality there is usually more structure, for nicer error messages, skipping known edge cases, etc.). The test framework (e.g. Hypothesis, Quickcheck, Scalacheck, etc.) will run these functions on many different values, usually generated at random.
This works really well for 'algorithmic' code, e.g. '(x: Foo) => parse(print(x)) == x', but it can sometimes be difficult to think of general statements we can make about more complicated systems. In that case, it's often useful to test "code FOO returns without error". This is essentially fuzzing, and can be useful to find such problematic branches (at least, the "obvious", low-hanging fruit cases).
However, this test will fail, since this is not an invariant of the system. In particular, it will fail when given an argument like "&<>", since the resulting page will not contain the text 'Hello &<>' (instead, it will contain "Hello &<>").
All sorts of issues like this will crop up as we test more complicated code. For example, certain components might fail gracefully when given certain values (e.g. constructing a '404' return value); that might be perfectly correct behaviour, but it makes useful invariants harder to think of (since they must still hold, even in those cases!)
Mocking is completely orthogonal to that.
PS: I consider mocking to be a code smell. It can be very useful when trying to improve processes around a legacy system; but things which are designed to use mocking tend to be correlated with bad design.
I believe we are not on the same page for wnat we consider a complicated system to be. Do yo assume that $developer it the owner of all code in this mental exercise?
> 90% of the "unit tests" that I've observed in the wild are checking for things that a modern type system would easily prevent you from doing.
I'm a heavy Clojure user, so I often hear how static types are unnecessary and not useful, dynamic types are much more convenient and aren't you using clojur.spec anyway to make sure your data is correct?
But I'm a strong static type proponent and wish a statically typed Clojure existed. The reason is exactly as you say: with dynamic types, type errors are only found at runtime, which means the only way to find them ahead of time is to exercise the code thoroughly. So you rely on unit tests to find type errors.
The problem with that is, testing is labour intensive and also non-exhaustive. You have to come up with all the ways the types might become mismatched and in my personal experience, its super easy that something slips through. These kinds of bugs seem to be the biggest cause of production errors in my code and they are the exact types of errors that a static type checker would have prevented.
So I don't think unit tests area good substitute for a static type system.
Many people argue that the overhead of static typing makes them too slow, is too much effort and makes it harder to write code. In my personal opinion, static types make me think about my data much more deeply and helps me design better software. Yes, its slower, but its slower because it forces me to think about the problem space more. The actual extra typing (har har) necessary to add type annotations is a very small overhead. If that's really what's making you slow, then consider learning to touchtype, switching to a better keyboard layout or better physical keyboard, an IDE with better autocompletion, just learn to use the slow down to think about your code more, or use a language with type inference. Actual writing code is a small part of my day anyway, so its not been an issue for me when I use a statically typed language.
A problem I have with optional type hints is that they are optional: not all code (standard library, third party libraries) will have it, so you only get a small bit of the benefits.
The thing is, those unit tests achieve practical utility in those language environments, because without the protections of the type system those kinds of errors bring down systems all the time.
I agree that more rigorous type systems can spare you from having to write a lot of boilerplate tests, and I prefer to work in more strictly typed language environments. Nevertheless, "unit" tests that prove the functionality of individual elements, even if you write fewer of them, still offer a huge ROI. They are damn close to a "magic bullet", and every project that neglects them pays the price.
Another common problem with unit testing is that the function being tested should actually be four different functions and the setup is so complex, it becomes more of an integration test on the wrong level.
Then the developer will say "unit testing is a pain in the ass, I rather do integration tests". But the problem is that the code being tested is calling out its problems through these tests.
i hear this often but i don’t think it’s true. sometimes a good integration test will do you more favour than a dozen of unit tests. also unit tests can be a pain in the ass as a bad integration test
Unit testing has become a bit like a diploma. Something society requires to acknowledge your quality, but that is mostly pro-forma and very poorly correlated with domain skills or intelligence.
Those are systems that have firmly crossed from "simulation" territory to "simulacra" territory.
People who think about the actual value of specific tests like you are few and far between. Most feel constrained by peer pressure and the need to do what they perceive is correct by definition.
do_add(1, 2, add)
expect(add) to be called with (2, 1)
Then the function is changed to:
function do_add(a, b, add) {
return add(a, b);
}
Oops, broke the test. Is this a type of behavior change that you expect would break the test? Often there are many ways to correctly do something and a good test should allow any of them.
The very problem is that tests failing is not necessarily due to behavior changing but implementation details not affecting behavior changing.
This is especially the case when tests rely on mocks, which don't really implement the BEHAVIOR of a dependency, but rather represent a "replay" of a specific call stack sequence.
Rings true. I often write more unit tests in a team than I otherwise would, even if I don't think they add that much value, just because no pull request ever failed review for having too many unit tests, and any time spent arguing about it is time I'd rather spend building the next feature.
> Often I have seen people put a lot of effort into testing things that really doesn't have much payback compared to what they could be testing instead.
Do you have examples of this? What kind of tests do you use and to test what? I've seen people testing _literally everything_ and some only the happy path, failures and critical units like user input/protocol assumptions/algorithms.
Someone was testing my code and used a Jira plugin (xray) to document all of their "evidence". This gave non technical stakeholders a lot of confidence because the evidence looked so fantastic and neat. My business analyst found a defect and I raised it with the tester as it was relevant to another stream of work the tester had completed earlier. The tester showed that they were unfamiliar with the business requirement relevant to the defect. I dug and prodded the tester a little, only to uncover that the tester felt that referring to my code repository and basically re running my code to compare dataframe row counts, etc was adequate test coverage. Don't be fooled by "evidence".
I've always felt the best approach for (automated) testing is:
Unit test style tests should test the specification. That is, you test that good input creates the results that the specification states it should and you test that the known failure cases that the specification says should be handled are, indeed, handled as the specification states. This means that most of the tests are generally happy path tests with some testing the boundaries and known bad cases. The goal is not to find bugs, but to prove in code that the implementation does in fact meet the requirements as set out in the specification.
Regression tests. Any time a bug is found, add a failing test to reproduce the code. Fix the bug and the test should pass. This both proves that the bug is actually fixed and prevents the bug from creeping back in again later. Again, the goal is not to find bugs, its to prevent reoccurrence and is completely reactionary.
Finally, property-based generative testing. Define properties and invariants of your code and schemas of your data, then generate random test cases (fuzzing essentially). Run these on a regular basis (overnight) and make sure that the properties always hold, for good input data, and that error states are handled correctly, for bad input data. You can also apply this to the overall system, by simulating network failures between docker containers [1]. The goal of this is to find bugs, since it will test things you won't have thought of, but since it generates random test cases, you aren't guaranteed that it will find anything. Its also notoriously hard to write these tests and come up with good properties. I don't often do these tests, only when the payoff vs effort seems worth it. Eg for a low impact system, its not worth the effort, but for something that's critical, it may be.
For human QA, I think it makes most sense to test the workflows that users do. Maybe sure everything works as expected, make sure the workflow isn't obtuse or awkward, make sure everything visually looks ok and isn't annoyingly slow. Test common mistakes due to user error. Stuff like that. I don't think we can expect this to be thorough and its unrealistic to think that it will find many bugs, just that it will make sure that most users experiences will be as designed.
So, test for what you expect (to prove that its what you expect), test known bugs to prevent regression and to prove that its fixed. Then, only if your software is critical enough to warrant the effort, use property-based testing as necessary to try and weed out actual bugs. Most software can skip that though.
IMO a key thing for human QA testing is to test unhappy paths and error handling. What happens if the user does something unexpected. The happy path is generally well tested by developers during development, but testing the unhappy paths is much more difficult and time consuming, and that's where the QAs come in.
Sure, agreed. That’s why I said “Test common mistakes due to user error. Stuff like that that.” My main point is that it’s unrealistic to expect human QA to find bugs, but what humans shine at that computers don’t is workflow issues, visual issues and “does it feel slow/bad/whatever”. I suppose leaning more heavily on the mistakes side of the workflow makes sense though, yes.
Which leads into being very clear about what you want to achieve with testing, correctness, robustness, fit for purpose etc, and be clear how much effort you want to put into each area. Often I have seen people put a lot of effort into testing things that really doesn't have much payback compared to what they could be testing instead. So be careful of the opportunity costs of your testing efforts.