>No-one said that integration tests can't also be very valuable. Integration tes...

berkes · on Dec 21, 2023

> Integration tests are a better kind of default test because they bring value under pretty much all circumstances.

I respectfully disagree. Not with the last part, that is true: they do bring value under pretty much all circumstances. But the first. Because integration tests come with (extremely) high costs.

They are expensive to run. They are much harder (costlier) to write. They are even harder (costlier) to maintain. The common pushback against tests -but they slow down our team a lot- applies to integration tests much more than to unit tests - factors more. And so on.

As with everything software-engineering, choosing what tests to write is a tradeoff. And taking all into consideration, e2e or integration tests are often not worth their investment¹. The testing pyramid fixes this, because testing always (well- it depends) is worth the investment. But when you skew the testing pyramid, or worse, make it an testing-ice-cream-cone, that ROI can and will often quickly become negative.

¹Edit: I meant to say that many of these e2e tests are not worth their investment. Testing edge-cases for example: if you need man-hours to write a regression test e2e style and then man-weeks to maintain and run that over coming years, it's often better ROI to just let that regression re-appear and have customers report it. Whereas a unit-test that captures this edge-case costs maybe an hour to write, milliseconds to run and hardly any time to maintain.

MoreQARespect · on Dec 21, 2023

>Because integration tests come with (extremely) high costs.

Unit tests usually have lower capex and higher opex. It often takes less time and effort to write a single lower level unit test but that test will require more frequent maintenance as the code around it evolves due to refactoring.

Integration often tests have higher capex because they rely upon a few complex integration points - e.g. to set up a test to talk to a faux message queue takes time. Getting playwright set up takes quite a chunk of up front time. Building an integration with a faux SMTP endpoint takes time. What is different is that these tools are a lot more generic so it's easier to stand on the shoulders of others and they are more reusable and it's easier to leverage past integrations to write future scenarios. E.g. you don't have to write your own playwright somebody already did that and once you have playwright integrated into your framework any web-related steps on future scenarios suddenly become much easier to write.

Whereas with unit tests the reusability of code and fixtures written in previous tests is generally not as high.

You have to also take into account the % of false negatives and false positives.

I find unit tests often raise more false positives because ordinary legitimate refactoring that introduced no bugs is more likely to break them. This reduces the payoff because you will have more ongoing test failures requiring investigation and maintenance work to mitigate this.

I also find that the % of false negatives is lower. This is harder to appreciate because you wouldn't ever expect, for instance, a unit test to catch that somebody tweaked some CSS that broke a screen or broke email compatibility with outlook, but these are still bugs and they are bugs that integration tests at a high level can catch with appropriate tooling but unit tests will never, ever, ever catch.

>But when you skew the testing pyramid, or worse, make it an testing-ice-cream-cone, that ROI can and will often quickly become negative.

The pyramid is an arbitrary shape that assumes a one size fits all approach works for all software. I think it is one of the worst ideas to ever grace the testing community. What was particularly bad was Google's idea that flakiness should be avoided by avoiding writing tests and applying good engineering practices to root out the flakiness. It was an open advertisement that they were being hampered by their own engineering capabilities.

I do agree that this is a cost/benefit calculation and if you shift some variable (e.g. E2E test tooling is super flaky and you've got good, stable abstractions to write your unit tests against, you've got a lot of complex calculations in your code), then that changes the test level payoff matrix, but I find that the costs and benefits work out pretty consistently to favor integration tests these days.

berkes · on Dec 21, 2023

> single lower level unit test but that test will require more frequent maintenance as the code around it evolves due to refactoring.

"more frequent" is not the same as "high maintenance costs" though.

Unit tests should only change when the unit-under-test (sut) changes. Which, for many units is "never". And for some with high churn, indeed, a lot.

Actual and pure e2e tests should never have to change except when the functionality changes.

But all other integration tests most often change whenever one of the components changes. I've had situations where whenever we changed some relation, or added a required-field in our database, we had to manually change hundreds of integration tests and their helpers. "Adding a required field" then became a chore of days of wading through integration tests¹.

With the unit-tests, only one, extremely simple test changed in that case. With the end-to-end-tests, also, hundreds needed manual changes. But that was because they weren't actual end-to-end tests, and did all sorts of poking around in the database. Worse: that poking-around wasn't abstracted even.

What I'm trying to convey with this example, is that in reality, unit-tests change often if the SUT has a high churn, but that those changes are very local and isolated and simple. Yet, in practice, with integration-tests, the smallest unrelated change to a "unit" has a domino-effect on whole sections of these tests. (And also that in this example, our E2E were badly designed and terribly executed)

¹Edit: one can imagine the pressure of management to just stop testing.