Most days per day also probably means that you're maximizing the amount you spend on travel and the pain you experience. It's not cheap or pleasant traveling around July 4th.
It's similar to static site generators - it's easy enough to create a simple one and hundreds of people have but the real differentiator is having hundreds of high quality working templates and aint nobody got time to maintain those.
The CLI diagramming winner will be the one that can easily do 150 different types of diagram out of the box. It might never happen.
>Now, suppose that the product manager comes to us one day and says, "We have a new design, we'd now like it to say "Greetings Brandon, welcome to our website!"
I have been writing YAML based tests which rewrite themselves based upon program output in these cases.
Change the code -> run one script and now the test is updated. There is near zero test maintenance cost for surface level changes like this message or a tweaked JSON output from an API.
I call it snapshot test driven development.
I agree with the overall sentiment - testing ought to considered an investment that provides a recognized ROI.
There's a similar concept of "snapshots" in UI testing, which I maybe should have included in the post
I think snapshots definitely lower (but don't eliminate) the cost of a certain class of tests, and can sometimes tip the scales from "not worth it" to "worth it"
It's almost the same thing, I think. I do the same thing with text artefacts and screenshots.
There is a hidden cost to snapshot testing - theyre highly susceptible to flakiness. The tests have to be hermetic and the all the code has to be deterministic or it doesnt work.
I've had this problem with team vocabulary. Once one word gets used to describe 2 or more things it becomes impossible to roll back.
I tried banning a particular word once and coming up with 6 new words on the basis that you couldnt tell which of the 6 meanings somebody was using unless you used a different term for each one. That didnt work very well. People got too attached to the old, overloaded term.
Expecting an entire industry to stop using a term theyve become used to is not realistic.
>Yes, integration tests do tend to cover integration with third party libraries and even entire products (such as databases). But even in integration tests, the third party code is incidental.
It definitely shouldnt be. At least, not if you want your tests to tell you when upgrading a 3rd party dependency will break something.
>For instance, when created integration tests for databases, it's very common to use an embedded or in-memory DB
I worked on a project that did this once and they very quickly got blocked writing a test by the in memory database not supporting a feature postgres had.
Result? "Oh well, I guess we dont write an automated test for that." Manual QA's problem now. That's slow.
Realism matters. Sacrificing realism for speed often means you will get neither.
>you're not supposed use mocks (like a mocked clock)
I do do this and it usually works well. Why am I wrong?
>>I worked on a project that did this once and they very quickly got blocked writing a test by the in memory database not supporting a feature postgres had.
>Realism matters
It is often just a small subset of tests which have to be very expensive to build, maintain and run because of an external dependency that cannot be mocked. Mocks work for 95% of use cases and should be used even if it is not 100%.
I never found that to be true and moreover, these scenarios with nonstandard features tended to be the scenarios I was most worried about breaking.
There is also the problem of the in memory database behaving differently when given the same SQL so your test might make it look like everything works while a bug crops up in production.
> There is also the problem of the in memory database behaving differently when given the same SQL so your test might make it look like everything works while a bug crops up in production.
You've horribly messed up the architecture of your application if that is a problem. (Or you've misunderstood previous comments)
Postgres and sqlite/in memory dbs just behave differently to each other sometimes. Knowing this fact doesn't mean you've messed up your architecture it means that you have some understanding of how these databases work.
I'd say that in-memory/not-in-memory isn't the big difference - it's whether your database is in-process or not. Even with just a database running on the same node, but in a different process, connected to via unix socket, the context switches alone lead to very different performance characteristics. Actually going over the network obviously changes more. It's very easy to miss antipatterns like N+1 queries when you test on sqlite but run on a shared database in prod.
This is irrelevant for unit tests. Performance testing does not make any sense in build environment, you need to do it in an environment close to production and that’s completely different test automation scope.
You can see this stuff often even in test workloads. But even if you disregard that kind of issue, you still have stuff like needing to integrate networked database connections into e.g. event loops, which you don't really need to do for things like sqlite.
TDD works best if you default to testing at the outer shell of the app - e.g. translating a user story into steps executed by playwright against your web app and only TDDing lower layers once youve used those higher level tests to evolve a useful abstraction underneath the outer shell.
It seems to be taught in a fucked up way though where you imagine you want a car object and a banana object and you want to insert the banana into a car or some other kind of abstract nonsense.
>No-one said that integration tests can't also be very valuable.
Integration tests are a better kind of default test because they bring value under pretty much all circumstances.
Nobody said that unit tests cant also be valuable and under just the right circumstances i.e. - complex stateless code behind a stable API.
Unit tests shine in that environment - theyre not impeded by their crippling lack of realism because that stable abstraction walls off the rest of reality. And theyre very fast.
Most code isnt parsers, calculation engines, complex string manipulation, etc. - but when it is unit tests really do kick ass.
They just suck so badly at testing code that doesnt fit that mold. Which, to be fair, is most code. I dont write a lot of parsers at work. My job involes moving data into databases, calling APIs, linking up message queues, etc.
> Integration tests are a better kind of default test because they bring value under pretty much all circumstances.
I respectfully disagree. Not with the last part, that is true: they do bring value under pretty much all circumstances. But the first. Because integration tests come with (extremely) high costs.
They are expensive to run. They are much harder (costlier) to write. They are even harder (costlier) to maintain. The common pushback against tests -but they slow down our team a lot- applies to integration tests much more than to unit tests - factors more. And so on.
As with everything software-engineering, choosing what tests to write is a tradeoff. And taking all into consideration, e2e or integration tests are often not worth their investment¹. The testing pyramid fixes this, because testing always (well- it depends) is worth the investment. But when you skew the testing pyramid, or worse, make it an testing-ice-cream-cone, that ROI can and will often quickly become negative.
¹Edit: I meant to say that many of these e2e tests are not worth their investment. Testing edge-cases for example: if you need man-hours to write a regression test e2e style and then man-weeks to maintain and run that over coming years, it's often better ROI to just let that regression re-appear and have customers report it. Whereas a unit-test that captures this edge-case costs maybe an hour to write, milliseconds to run and hardly any time to maintain.
>Because integration tests come with (extremely) high costs.
Unit tests usually have lower capex and higher opex. It often takes less time and effort to write a single lower level unit test but that test will require more frequent maintenance as the code around it evolves due to refactoring.
Integration often tests have higher capex because they rely upon a few complex integration points - e.g. to set up a test to talk to a faux message queue takes time. Getting playwright set up takes quite a chunk of up front time. Building an integration with a faux SMTP endpoint takes time. What is different is that these tools are a lot more generic so it's easier to stand on the shoulders of others and they are more reusable and it's easier to leverage past integrations to write future scenarios. E.g. you don't have to write your own playwright somebody already did that and once you have playwright integrated into your framework any web-related steps on future scenarios suddenly become much easier to write.
Whereas with unit tests the reusability of code and fixtures written in previous tests is generally not as high.
You have to also take into account the % of false negatives and false positives.
I find unit tests often raise more false positives because ordinary legitimate refactoring that introduced no bugs is more likely to break them. This reduces the payoff because you will have more ongoing test failures requiring investigation and maintenance work to mitigate this.
I also find that the % of false negatives is lower. This is harder to appreciate because you wouldn't ever expect, for instance, a unit test to catch that somebody tweaked some CSS that broke a screen or broke email compatibility with outlook, but these are still bugs and they are bugs that integration tests at a high level can catch with appropriate tooling but unit tests will never, ever, ever catch.
>But when you skew the testing pyramid, or worse, make it an testing-ice-cream-cone, that ROI can and will often quickly become negative.
The pyramid is an arbitrary shape that assumes a one size fits all approach works for all software. I think it is one of the worst ideas to ever grace the testing community. What was particularly bad was Google's idea that flakiness should be avoided by avoiding writing tests and applying good engineering practices to root out the flakiness. It was an open advertisement that they were being hampered by their own engineering capabilities.
I do agree that this is a cost/benefit calculation and if you shift some variable (e.g. E2E test tooling is super flaky and you've got good, stable abstractions to write your unit tests against, you've got a lot of complex calculations in your code), then that changes the test level payoff matrix, but I find that the costs and benefits work out pretty consistently to favor integration tests these days.
> single lower level unit test but that test will require more frequent maintenance as the code around it evolves due to refactoring.
"more frequent" is not the same as "high maintenance costs" though.
Unit tests should only change when the unit-under-test (sut) changes. Which, for many units is "never". And for some with high churn, indeed, a lot.
Actual and pure e2e tests should never have to change except when the functionality changes.
But all other integration tests most often change whenever one of the components changes. I've had situations where whenever we changed some relation, or added a required-field in our database, we had to manually change hundreds of integration tests and their helpers. "Adding a required field" then became a chore of days of wading through integration tests¹.
With the unit-tests, only one, extremely simple test changed in that case.
With the end-to-end-tests, also, hundreds needed manual changes. But that was because they weren't actual end-to-end tests, and did all sorts of poking around in the database. Worse: that poking-around wasn't abstracted even.
What I'm trying to convey with this example, is that in reality, unit-tests change often if the SUT has a high churn, but that those changes are very local and isolated and simple. Yet, in practice, with integration-tests, the smallest unrelated change to a "unit" has a domino-effect on whole sections of these tests.
(And also that in this example, our E2E were badly designed and terribly executed)
¹Edit: one can imagine the pressure of management to just stop testing.
I've evangelized against unit testing at most companies I work at, except in one specific circumstance. That circumstance is complex logic in stateless code behind a stable API where unit testing is fine. I find this usually represents between 5-30% of most code bases.
The idea that unit testing should be the default go to test I find to be horrifying.
I find that unit test believers struggle with the following:
1) The idea that test realism might actually matter more than test speed.
2) The idea that if the code is "hard to unit test" that it is not necessarily better for the code to adapt to the unit test. In general it's less risky to adapt the test to the code than it is the code to the test (i.e. by introducing DI). It seems to be tied up with some sort of idea that unit testability/DI just makes code inherently better.
3) The idea that integration tests are naturally flaky. They're not. Flakiness is caused by inadequate control over the environment and/or non-deterministic code. Both are fixable if you have the engineering chops.
4) The idea that test distributions should conform to arbitrary shapes for reasons that are more about "because google considered integration tests to be naturally flaky".
5) Dogma (e.g. uncle bob or rainsberger's advice) vs. the idea that tests are investment that should pay dividends and to design them according to the projected investment payoff rather than to fit some kind of "ideal".
> The idea that unit testing should be the default go to test I find to be horrifying.
Kent Beck, who invented the term unit test, was quite clear that a unit test is a test that exists independent of other tests. In practice, this means that a unit test won't break other tests.
I am not sure why you would want anything other than unit tests? Surely everyone agrees that one test being able to break another test is a bad practice that will turn your life into a nightmare?
I expect we find all of these nonsensical definitions for unit testing appearing these days because nobody is writing anything other than unit tests anymore, and therefore the term has lost all meaning. Maybe it's simply time to just drop it from our lexicon instead of desperately grasping at straws to redefine it?
> It seems to be tied up with some sort of idea that unit testability/DI just makes code inherently better.
DI does not make testing or code better if used without purpose (and will probably make it worse), but in my experience when a test will genuinely benefit from DI, so too will the actual code down the line as requirements change. Testing can be a pretty good place for you to discover where it is likely that DI will be beneficial to your codebase.
> The idea that test realism might actually matter more than test speed.
Beck has also been abundantly clear that unit tests should not resort to mocking, or similar, to the greatest extent that is reasonable (testing for a case of hardware failure might be place to simulate a failure condition rather than actually damaging your hardware). "Realism" is inherit to unit tests. Whatever it is you are talking about, it is certainly not unit testing.
It seems it isn't anything... other than yet another contrived attempt to try and find new life for the term that really should just go out to pasture. It served its purpose of rallying developers around the idea of individual tests being independent of each other – something that wasn't always a given. But I think we're all on the same page now.
> Kent Beck, who invented the term unit test, was quite clear that a unit test is a test that exists independent of other tests
Kent Beck didn't invent the term "unit test", it's been used since the 70's (at minimum).
> I am not sure why you would want anything other than unit tests?
The reason is to produce higher quality code than if you rely on unit tests only. Generally, unit tests catch a minority of bugs, other tests like end to end testing help catch the remainder.
> other tests like end to end testing help catch the remainder.
End-to-end tests are unit tests, generally speaking. Something end-to-end can be captured within a unit. The divide you are trying to invent doesn't exist, and, frankly, is nonsensical.
> End-to-end tests are unit tests, generally speaking.
Generally, in the software industry, those terms are not considered the same thing, they are at opposite ends of a spectrum. Unit tests are testing more isolated/individual functionality while the end to end test is testing an entire business flow.
Here's an example of one end to end test (with validations happening at each step):
1-System A sends Inventory availability to system B
2-The purchasing dept enters a PO into system B
3-System B sends the PO to system A
4-System A assigns the PO to a Distribution Center for fulfillment
5-System A fulfills the order
6-System A sends the ASN and Invoice to system B
7-System B users process the PO receipt
8-System B users perform three way match on PO, Receipt and Invoice documents
Bad example, perhaps, but that's also a unit test[1]. Step 8 is dependent on the state of step 1, and everything else in between, so it cannot be reduced any further (at last not without doing stupid things). That is your minimum viable unit; the individual, isolated functionality.
[1] At least so long as you don't do something that couples it with other tests, like modifying a shared database in a way that that will leave another test in an unpredictable state. But I think we have all come to agree that you should never do that – going back to the reality that the term unit test serves no purpose anymore. For all intents and purposes, all tests now written are unit tests.
Every step updates shared databases (frequently plural). In the case of the fulfillment step, the following systems+databases were involved: ERP, WMS, Shipping.
Typically, in end to end testing, tests are run within the same shared QA system and are semi-isolated based on choice of specific data (e.g. customers, products, orders, vendors, etc.). If this test causes a different test to fail, or vice-versa, then you have found a bug.
If we call that entire sequence of steps a "unit" test, would you start with testing the entire sequence of steps, or would you recommend testing the individual steps first?
And if we did test the individual steps first, we would give that testing a different name? Like maybe "sub-unit" testing?
> Every step updates shared databases (frequently plural).
That's fine. It all happens within a single unit. A unit should mutate shared state within the unit. Testing would be pretty much useless without.
> If we call that entire sequence of steps a "unit" test, would you start with testing the entire sequence of steps, or would you recommend testing the individual steps first?
For all intents and purposes, you can't test the individual steps. All subsequent steps are dependent on the change in inventory state in step 1. And the product of step one is undoubtedly internal state, so there is no way for the test to observe the state change in isolation (unless you do something stupid). You have to carry out the subsequent steps to be able to infer that the inventory was, in fact, updated appropriately.
After all, the whole reason you are testing those steps together is because you recognize that they represent a single instance of functionality. You don't really get to choose (unless you choose to do something stupid, I suppose).
> And if we did test the individual steps first, we would give that testing a different name?
If the individual steps can be tested individually (ignoring a case of you doing something stupid), it's not actually and end-to-end process, so your example would make no sense. Granted, we have already questioned if it is a bad example.
> For all intents and purposes, you can't test the individual steps.
Sure you can, and we did (that is a real example of an end to end test from a recent project) which also included testing the individual steps in isolation, which was preceded by testing the individual sub-steps/components of each step (which is the portion that is typically considered unit testing).
For example, step 1 is broken down into the following sub-steps which are all tested in isolation before testing the combined group together:
1.1-Calculate the current on hand inventory from all locations for all products
1.2-Calculate the current in transit inventory for all locations for all products
1.3-Calculate the current open inventory reservations by business partner and products
1.4-Calculate the current in process fulfillments by business partner and product
1.5-Resolve the configurable inventory feed rules for each business partner and product (or product group)
1.6-Using the data in 1.1 through 1.5, resolve the final available qty for each business partner and product
1.7-Construct system specific messages for each system and/or business partner (in some cases it's a one to one between business partner and system, but in other cases one system manages many business partners).
1.7.1-Send to system B
1.7.2-Send to system C
1.7.3-Send to system D
1.7.N-etc.
> And the product of step one is undoubtedly internal state, so there is no way for the test to observe the state change in isolation
The result of step 1 is that over in software system B (an entirely different application from system A) the inventory availability for each product from system A is properly represented in the system. Meaning queries, inquiries, reports, application functions (e.g. Inventory Availability by Partner), etc. all present the proper quantities.
To validate this step, it can be handled one of two ways:
1-Some sort of automated query that extracts data from system B and compares to the intended state from step 1 (probably by saving that data at the end of that step).
or
2-A user manually logs in to system B and compares to the expected values from step 1 (again saved or exposed in some way). This method works when the number of products is purposefully kept to a small number for testing purposes.
> If the individual steps can be tested individually (ignoring a case of you doing something stupid), it's not actually and end-to-end process, so your example would make no sense. Granted, we have already questioned if it is a bad example.
Yes the individual test can be tested in individually. Yes it is an end to end test.
> Granted, we have already questioned if it is a bad example.
It's a real example from a real project and it aligns with the general notion of an end to end test used in the industry.
More importantly, combined with the unit tests, functional tests, integration tests, performance tests, other end to end tests and finally user acceptance tests, it contributed to a successful go-live with very few bugs or design issues.
I dont know many people who would describe a test that uses playwright and hits a database as a unit test just because it is self contained. If Kent Beck does then he has a highly personalized definition of the term that conflicts with its common usage.
The most common usage is, I think, an xUnit style test which interacts with an app's code API and mocks out, at a minimum, interactions with systems external to the app under test (e.g. database, API calls).
He may have coined the term but that does not mean he owns it. If I were him Id pick a different name for his idiosyncratic meaning than unit test - one that isnt overburdened with too much baggage already.
> He may have coined the term but that does not mean he owns it.
Certainly not, but there is no redefinition that is anything more than gobbledygook. Look at the very definition you gave: That's not a unique or different way to write tests. It's not even a testing pattern in concept. That's just programming in general. It is not, for example, unusual for you to use an alternative database implementation (e.g. an in-memory database) during development where it is a suitable technical solution to a technical problem, even outside of an automated test environment. To frame it as some special unique kind of test is nonsensical.
If we can find a useful definition, by all means, but otherwise what's the point? There is no reason to desperately try to save it with meaningless words just because it is catchy.
The definition I gave is the one people use. Hate or love it youre not going to change it to encompass end to end tests and neither will Kent Beck. It's too embedded.
I might. I once called attention to the once prevailing definition of "microservices" also not saying anything. At the time I was treated like I had two heads, but sure enough now I see a sizeable portion (not all, yet...) of developers using the updated definition I suggested that actually communicates something. Word gets around.
Granted, in that case there was a better definition for people to latch onto. In this case, I see no use for the term 'unit test' at all. Practically speaking, all tests people write today are unit tests. 'Unit' adds no additional information that isn't already implied in 'test' alone and I cannot find anything within the realm of testing that needs additional differentiation not already captured by another term.
If nothing changes, so what? I couldn't care less about what someone else thinks. Calling attention to people parroting terms that are meaningless is entirely for my own amusement, not some bizarre effort to try and change someone else. That would be plain weird.
Well, I don't regard unit tests as the one true way. I don't enforce people on my team do it my way. When I get compliments on my work, I tend to elaborate and spread my approach. That's what I mean by evangelize, not necessarily advocating for a specific criteria to be met.
I find that integration tests are usually are flaky, its my personal experience. In fact, at my company, we just decided to completely turn them off because they fail for many reasons and the usual fix is to adjust the test. If you have had a lot of success with them, great. Just for the record, I am not anti-integration or end-to-end test. I think they have a place and just like unit tests shouldn't be the default, neither should they.
Here are the two most common scenarios where I find integration (usually end-to-end called integration) tests become flaky:
1) DateTime, some part of business logic relies on the current date or time and it wasn't accounted for.
2) Data changes, got deleted, it expired, etc. and the test did not first create everything it needed before running the test.
Regarding your points,
1) "realism" that is what I referred to as trusting that a unit test is good enough. If it didn't go all the way to the database and back did it test your system? In my personal work, I find that pulling the data from a database and supplying it with a mock are the same thing. So it's not only real enough for me, but better because I can simulate all kinds of scenarios that wouldn't be possible in true end-to-end tests.
2) These days the only code that's hard to test is from people that are strictly enforcing OOP. Just like any approach in programming, it will have it's pros and cons. I rarely go down that route, so testing isn't usually difficult for me.
3) It's just been my personal experience. Like I said, I'm not anti-integration tests, but I don't write very many of them.
4) I didn't refer to google, just my personal industry experience.
5) Enforcing ideal is a waste of time in programming. People only care about what they see when it ships. I just ship better quality code when I unit test my business logic. Some engineers benefit from it, some harm themselves in confusion, not much I can do about it.
Most of this is my personal experience, no knock against anyone and I don't force my ideals on anybody. I happily share what and why things work for me. I gradually introduce my own learning over time as I am asked questions and don't seek to enforce anything.
>One test. A really truly automated test, with setup & invocation & assertions (protip: trying working backwards from the assertions some time). It’s in the writing of this test that you’ll begin making design decisions, but they are primarily interface decisions. Some implementation decisions may leak through, but you’ll get better at avoiding this over time.
If you're truly doing interface decisions without leaking implementation decisions then your test is probably doing something like using playwright to follow a user story or calling a REST API and seeing what comes back.
There's an entire book by the author, titled Test-Driven Development by Example. I bet you can find a library to lend you a copy. <https://search.worldcat.org/title/50479239>