Testing Without Mocks: A Pattern Language

thom · on Jan 8, 2023

Once you’ve replaced each infrastructure dependency with some form of test double (whatever you call it) that captures state, broad tests are no longer slow nor do they fail randomly so I’m not sure I understand why you’d ditch them. Ubiquitous narrow tests are brittle in the face of most refactorings whereas broad tests care about things your users care about and usually persist across those changes. I’d always prefer the latter, though I’m happy to have both. A codebase without broad tests (or without explicitly modelled use cases that are ‘sociable’ with a broad range of dependencies anyway) is unlikely to describe what it actually _does_ anywhere and sounds like a pain to reason about.

flippinburgers · on Jan 8, 2023

I write unit tests selectively to capture unwritten rules about how some functions need to act. It is a better form of documentation. Otherwise, I lean on what you are describing: actually testing a system against the outer "api layer" that is interacted with by external users be it a UI or an actual api.

I avoid mocking like the plague unless it truly is an external system I have no control over. Then I might mock as a form of documentation about expectations.

I was involved in a project recently that uses the "clean architecture" approach where every layer was abstracted and everything was mocked. Honestly it was a major turn off on two levels: I deeply dislike the "clean" architecture based on this single experience alone and I found my deep dislike of heavy mocking to be rekindled.

teknopaul · on Jan 8, 2023

Been in that "everything mocked" scenario too. I try to mock as little as possible too but ensure that as much as possible can be mocked, because very often that is path of least resistance for a specific test.

I've seen code with so much mocked that many unit tests failed to test anything other than behaviour of mocks.

Not sure I'm comfortable with test code and switches being deployed as an alternative. I find supporting different types of test to be a benefit. Seems to me if you have test infrastructure in the code that will limit options. I have seen code that activly prevented mocks that was impossible to test.

u801e · on Jan 8, 2023

> I've seen code with so much mocked that many unit tests failed to test anything other than behaviour of mocks.

One example of this was a test that passed because of the expected behavior as determined by a mock, but the code failed in the staging environment because the underlying library changed its method signature to take some parameters in a different order.

To me, this basically was a signal that that particular test and tests like it were essentially useless in determining whether the code was really in a working state.

flippinburgers · on Jan 8, 2023

This is the major achilles heal of mocking that I find to be most bothersome. If at all possible I try to have at least a happy path subset of real tests hitting real 3rd party apis if I possibly can.

jdlshore · on Jan 8, 2023

Author here. What you’re describing is basically the same thing as sociable tests at the top of your system, which is part of what I'm recommending. You could stop there if you wanted.

I like having narrow tests rather than broad tests, though, because it allows me to keep my tests localized. For example, in the simple example app, I have a Rot13 module with a bunch of edge cases. How does it handle upper-case letters, lower-case letters, numbers, symbols, etc.

I could test that with broad tests, but doing that in a large system means that you end up with a combinatorial explosion of tests, or (more likely) hidden gaps in your tests. I find it easier to test my code thoroughly with narrow tests, which allows me to test higher-level code with the security of knowing that lower-level code Just Works(tm) and doesn't need further testing.

I haven't found narrow sociable tests to be brittle in the face of refactorings, at least not when using the patterns I describe. That's a big motivation for the pattern language—to make refactoring easy, even big architectural refactorings—and that's turned out to be true in practice.

vikinghckr · on Jan 8, 2023

Hi, I really enjoyed the piece! How much to test at each layer of abstraction has always been something that's bugged me. This article provides a nice strategy.

I still have questions about a few scenarios though. Let's say, I have two classes like this:

  class KeyValueStore {
    int64 Get(Key key);
  };

  class CachingKeyValueStore {
    CachingKeyValueStore(const KeyValueStore& wrapped);

    int64 Get(Key key);
  }

Here, the `CachingKeyValueStore` class is a wrapper around the `KeyValueStore` class, and its purpose is to simply maintain an in-memory cache (instead of RPC calls or Database calls). How would you unit test the `CachingKeyValueStore` class? So far, using Mocks have been my only strategy, because behavior-wise, both classes have the exact same outputs for the same inputs.

jdlshore · on Jan 8, 2023

Thanks! I'm glad you enjoyed it.

The behavioral difference between Caching KeyValue Store and KeyValueStore is that it doesn't fetch from the underlying data source when called a second time. So that's what I would test. Given the signature you provided, I would either instrument KeyValueStore so I could tell if it had been run twice, or (if KeyValueStore has multiple implementations) I might create a ExampleKeyValueStore in my tests that had that instrumentation.

I solved this problem a bit differently in my code, though. Instead of making a CachingKeyValueStore, I made a MemoryCache. It has a get() method that takes a key and a function. The first time it's called, it runs the function; the second time, it returns the stored result. Here's one of my actual tests for that code (JavaScript):

  it("caches result of function, so it's only run once", async function() {
    const cache = createIndependentCache();

    let runCount = 0;
    const getFn = () => runCount++;

    await cache.getAsync(NULL_LOG, "key", getFn);
    await cache.getAsync(NULL_LOG, "key", getFn);

    assert.equal(runCount, 1);
  });

And in case you’re curious, here’s the implementation, excluding a bunch of instrumentation and logging:

  async getAsync(log, key, getFnAsync) {
    if (this._cache[key] === undefined) {
      this._cache[key] = getFnAsync();
    }

    try {
      return await this._cache[key];
    }
    catch (err) {
      delete this._cache[key];   // don't cache errors
      throw err;
    }
  }

vikinghckr · on Jan 8, 2023

Thanks! I really like the idea of the instrumented ExampleKeyValueStore class which can be defined like this:

  class ExampleKeyValueStore {
    ExampleKeyValueStore(KeyValueStore wrapped);
    int hit_count();
  };

That means I can use the real KeyValueStore as a dependency while still testing the caching behavior. Cool!

joshuamorton · on Jan 8, 2023

You probably should have some additional tests around the CachingKeyValueStore related to eviction and size. Maybe this doesn't matter very often, but you should at least test the behavior of your Caching store if when the cache size is only, e.g. 1 item.

You can either do this by passing in a mock KVStore, or by passing in a normal/fake KV store in which you can update the underlying data. So for example:

    data = ...
    kv = new KVStore(data)
    cache = new CachingKVStore(kv, cache_size=1)
    v = cache.get(k1)
    data.update(k1, new_value)
    v2 = cache.get(k1)
    assert v2 == v1  // Cached!
    v3 = cache.get(k2)  // evict k1
    v4 = cache.get(k1)
    assert v4 != v1
    assert v4 == new_value  // Gets the new value

Otherwise, yeah for some cases like this you can just inherit from the KVStore test class and run all the same tests with a slightly different set up method to use a caching store instead.

vikinghckr · on Jan 8, 2023

Thanks! Injecting the data itself is an interesting approach!

pharmakom · on Jan 8, 2023

I would use a mock for the wrapped instance that generates a strong random value on each call to “Get”.

Then I would do some basic unit tests and maybe property based testing given that the only way for the same value to appear on subsequent “Get” calls is for caching to have occurred.

klabb3 · on Jan 8, 2023

> I could test that with broad tests, but doing that in a large system means that you end up with a combinatorial explosion of tests, or (more likely) hidden gaps in your tests.

Extremely well put. This maps exactly to my experience when moving from unit tests to the layer above. Test are taught exactly like “draw the rest of the fucking owl” meme, and I’ve never seen them actually hold up to the nice and tidy “hello word” tests that you see in demos. You’re actually trying to tackle the reality of gritty and ugly software systems, and it’s very refreshing to see, because it may just work. It’s gonna take a while to see if it holds up, but so far I am really excited.

thom · on Jan 8, 2023

For logic heavy parts of the system then yes, narrow tests are very useful. Often these sorts of components are amenable to property based testing which definitely becomes harder the broader the surface. I have generally been lucky that these bits of code are either rare, or they become infrastructure dependencies quite quickly in which case you’re broadly testing just categories of responses, not all edge cases.

But I still think a general rule to live by is to maximise the ratio of business value covered by tests to the amount of program text referenced by tests. Too small a ratio and you’re gaining little confidence but exposing yourself to constant toil just to keep up with changing implementations, even when observable system behaviour doesn’t change much.

I’m unsure if top level sociable tests get me where I want to be. It seems like they stop one layer down, so I’m still concerned I don’t capture actual end to end use cases anywhere in the codebase. I’m not personally comfortable with that (from a testing point of view but also because I’m tired of working on codebases that don’t clearly reveal what they do and why), but that outlook’s been formed by my own particular failures and shortcomings over the years so it might be quite subjective.

drewcoo · on Jan 8, 2023

> Once you’ve replaced each infrastructure dependency with some form of test double (whatever you call it) that captures state, broad tests are no longer slow nor do they fail randomly

From my experience, broad tests are slow and brittle because they involve an entire environment. That requires manipulation or knowledge of many repos, additional maintenance of test data, oftentimes gambling on whether someone changed the environment without telling you, and more. Under the best circumstances, tests have to traverse several layers of code and interact with side-effects, all of which usually lack observability.

I love shops that have microservices that own their data stores. Unfortunately, there are too many monoliths and what I call "macroliths" (tightly coupled monoliths built out of services running in k8s). If everything is tightly coupled around a store, I don't think the article solves the problem either. Or makes testing any more rigorous or complete, for that matter.

elboru · on Jan 8, 2023

There's a middle tier, broad tests in the sense of not mocking local dependencies but only external (slow) ones.

Many developers mock every dependency in a SUT, even if they own the dependency. I like the idea of broad tests that run-in memory (no I/O interaction).

I think Ian Cooper does a great work explaining it in this talk (he talks about TDD, but I think the main idea applies even if you write tests after code): https://www.youtube.com/watch?v=EZ05e7EMOLM&ab_channel=DevTe...

thom · on Jan 8, 2023

This is more what I was suggesting, although I’ve not actually found in-proc test doubles talking via localhost to be slow or brittle so I don’t mind mocking on either side of that divide. You’re going to write those mocks and tests anyway to check how you’re interacting with external systems in narrow tests, so you can choose how deep your broad tests go.

djur · on Jan 8, 2023

There's actually some good ideas in here. I wish it wasn't written in such a buzzword-heavy, consultant-y style. When advice includes so many Novel Proper Nouns I find myself checking my wallet.

My main quibbles are that these "Nullables" are still test doubles, just implemented in a way with different tradeoffs than typical mocking frameworks. Also, spies are still helpful when you're checking for side effects. I'm not really seeing any alternative offered here for testing "does this code send 0, 1, or 500 emails".

readthenotes1 · on Jan 8, 2023

otoh, I worked with a guy who was so mock heavy that he actually wrote real mocks for hash table for each test.

I worked with another guy who mocked out so much external behavior that he couldn't figure out why his stuff never worked, not doing the necessary work to ensure that his expectations of external behavior were actually still met ( if you're going to mock out external behavior, you need to add in tests that ensures your expectations are valid).

There is a place for Mocks, but I have found the other varieties of test doubles to more useful...

what-no-tests · on Jan 8, 2023

If I'm writing unit tests for #some_method, I'll mock anything that method calls outside of that method (so I can check how that method will respond to error conditions from calling those other methods).

Unit tests != Integration tests, and the two must both exist otherwise I'll run into that situation where my unit tests work but the software doesn't.

UglyToad · on Jan 8, 2023

But do you not find the problem is you now end up with tests that assume the existence of #some_method with the tests tightly coupled to the internal logic and flow of that method?

If you decide to refactor, every part of the system now has one or more tests that break because they mimic or clone each method 1:1 rather than testing input and output.

what-no-tests · on Jan 8, 2023

Perhaps, but if I refactor #some_method I only have to look in my unit tests where I describe #some_method and focus there. The integration tests should not have to be updated, since they only describe expected behavior -- unless the expected behavior must change as well.

The coupling between unit tests and the methods they test is correct, IMO, because really how else are you supposed to test what #some_method would do when it tries to call #another_method but #another_method raises various different exceptions, or returns no results, or returns successfully? If the answer is "I only test happy paths" then good luck - you're not really testing anything.

readthenotes1 · on Jan 8, 2023

Exactly the problem the 2nd guy had

convolvatron · on Jan 8, 2023

what is the place for mocks?

Tainnor · on Jan 8, 2023

External systems, unexpected error cases (which by nature of being unexpected can't really be triggered in a "natural" way), very generic code (e.g. instrumentation frameworks), major architectural boundaries, ...

I agree that most people over-mock, though.

readthenotes1 · on Jan 8, 2023

External systems dependencies are better handled with stubs and fakes than mocks, in my experience.

Tainnor · on Jan 8, 2023

That really depends on the case in question and on the particulars of the codebase.

I wouldn't recommend mocking http libraries, database calls etc. - that usually leads to madness and impenetrable test setups. I would rather recommend to either use fake implementations (e.g. in-memory DB) or to only integration-test the boundaries of your interaction with the external system (that might or might not be hard to achieve).

But I do like the pattern of wrapping calls to external APIs through a layer that I control - so that, for example, I may just write mailer.sendMessage(message) - and then that mailer interface seems like a good candidate for mocking.

readthenotes1 · on Jan 8, 2023

In my experience, they are the test double of last resort.

I used them to verify behavior, not outcome (as that's what expectations do), e.g., did I really call this method only once?

chociej · on Jan 8, 2023

I feel that the writing style reinforced the well-thought-out consistency of the system. The information of value was all right there, so I didn't feel worried about getting a bill.

jdlshore · on Jan 8, 2023

Author here. The way to test "does this code send 0, 1, or 500 emails" is the Output Tracking pattern: https://www.jamesshore.com/v2/projects/testing-without-mocks...

Thanks for the wallet!

djur · on Jan 8, 2023

Thanks for the link and the original article. To me, OutputTracker looks a lot like a spy, except instead of verifying that the unit under test took a particular action you're instead verifying that it logged that it took an action. That would seem to create the risk of missing cases where the events emitted by code don't match its actual behavior.

jdlshore · on Jan 8, 2023

Output tracking and spies are solving the same problem, but they do it in different ways. Spies record which methods are called. Output tracking records behavior that's otherwise invisible to callers (such as inserting something into a database).

There's no risk of missing cases. The output tracking happens at the same semantic level as the rest of the code and is a binary "tracked / not tracked" type of thing. There's no behavior to match, and the code is tested anyway.

Edit: By "no behavior to match," I mean that the thing doing the behavior is the thing tracking the behavior. The tracker is driven by events you emit when you perform a behavior.

djur · on Jan 8, 2023

I'm saying that if I have code like this, with

   payload = prepare_payload
   if verify_payload?(payload)
     mailer.deliver(payload)
     emitter.emit(:sent_the_email)
   end

there is a risk that a later change to this code will make it so that the `mailer` and `emitter` are not guaranteed to be called together. I have seen this bug in production and I don't see how your approach catches it. I'm also not sure how I'm supposed to test for different desired values in `payload` here.

jdlshore · on Jan 8, 2023

I mean, sure, if you program the output tracker incorrectly, it won't work. Not sure what else you expect. You're expected to have tests, of the output tracking code itself. They catch changes that breaks the output tracker, just like you have tests to catch any other regressions.

There's an example of that kind of test here: https://github.com/jamesshore/livestream/blob/2020-09-22-end...

Regarding testing different values, I think what you're missing is that you don't just emit an event; you emit an event with data. Typically it's whatever data you're delivering.

  emitter.emit(:sent_the_email, payload)

Then later, you assert on that data.

  assert.equal(delivered, { 'my_expected' => 'data_goes_here' })

djur · on Jan 8, 2023

Looking at your example, I'm still not seeing how this isn't just implementing ad hoc mocking for each component. The reason I'm interested is because the overall approach is very similar to what I've settled on over the years, other than the aversion to using labor-saving DI and mocking frameworks. I'm not sure why I should prefer to write more code (that needs to be tested itself) rather than relying on a well-tested and well-understood library.

jdlshore · on Jan 8, 2023

I don’t know what else to say, man. Maybe try it for yourself so you can see how it works?

Mocks lead to solitary, interaction-based tests.

My approach leads to sociable, state-based tests.

These are polar opposite testing approaches, with different tradeoffs. I don’t care which approach you use, but saying they’re the same thing means you don’t understand it.

Other people haven’t had the same problem understanding the fundamentals that you are. You’re asking very basic questions, which makes me think you haven’t taken the time to read the article carefully. I’m happy to help, but your dismissive attitude makes me think you’re less interested in understanding the material and more interested in proving that you don’t need to understand it. Your shallow comments about Capitalization and wallets didn’t exactly endear you to me, either.

I’ve provided a lot of material online. An article with tons of details and examples. Links to additional full-fledged examples. Multiple video series. Now it’s on you to take advantage of these resources. Or not; no skin off my nose either way.

djur · on Jan 8, 2023

I'm sorry to have offended and frustrated you. It wasn't my intention. I hope you have a nice day (sincerely).

valenterry · on Jan 8, 2023

That stops working quickly - namely as soon as you want to test a function A that uses two other functions B and C both of which have some output that is being used.

For example, a function B that sends an email to a user through a 3rd party system and returns an indiciation whether the request to send the email was successfull, a function C that stores in the database that a notification was sent successfully and now a function A that calls B and, if it fails, repeats it a few times, then calls C and, if it fails, repeats it a few times, otherwise fails itself.

This "do X, then depending on the ouput do Y or Z and dependin on their or ..." can't be tested in the way you describe.

You WILL end up using a form of "mocking", for example passing the functions B and C as arguments to A and then, under test, don't really pass B and C but different functions that allow you to make assertions in test. That is still mocking.

jdlshore · on Jan 8, 2023

There's nothing difficult about the scenario you're describing at all. I don't have example code for that specific scenario, but I do have an example of the following scenario:

A calls B, which calls an external service. B returns function D, which can be used to cancel the request. When B fails to return within five seconds, A calls D to cancel the request, then calls E to write an error message to stdout.

The test for this scenario checks that the request was made, the request was cancelled, and the error was written to stdout. You can see that test here:

https://github.com/jamesshore/livestream/blob/2020-09-22-end...

valenterry · on Jan 8, 2023

Unfortunately your example situation is not comparable. Try to come up with a test for my example that does not pass any arguments during test that would never be passed during a production run. I guarantee you, that is not possible without mocking. And I'm saying that as someone who really doesn't like mocking.

jdlshore · on Jan 8, 2023

Okay, I have nothing better to do this Sunday morning. Let's play with your example. We have a function A that uses B to send an email and C to store a notification in a database. We want to test that, when A fails, it calls B a few times, then calls C a few times, then fails.

I'm not going to write a full working program, but I'll flesh out your example a bit and explain how it works. I'll use JavaScript and the patterns described in the article.

I'm going to say "A" in your example is the VerificationEmailController class. It has a postAsync() method that handles POST requests. When it receives a POST request, it sends an "verify your email" email, then writes the result to a database.

"B" in your example is SendGridClient. It has a sendEmail() method that uses SendGrid to send email. It does it by making an HTTP call to the SendGrid service.

"C" in your example is a EmailVerificationAuditTable. It has a insertEmailSent() method that inserts a "success" or "fail" record into a database table.

"Failing" in your example involves writing an alert to the application log file. It uses ApplicationLog, which has a logEmergency() method that writes a structured log with the "FATAL" log level.

To summarize, we are writing and testing VerificationEmailController. It depends on SendGridClient, EmailVerificationAuditTable, and ApplicationLog.

SendGridClient, EmailVerificationAuditTable, and ApplicationLog use the patterns in the article. Specifically, they're Nullable, they're Infrastructure Wrappers, they have Configurable Responses, and they use Output Tracking.

Got it? Okay, let's write the test. This test is really doing too much, and should be broken out into multiple separate tests, but I'm going to follow the example you provided.

  it("fails cleanly by retrying email service and database service, then logging an alert", async () => {
    // First, we set up the dependencies. This is the Nullables and Configurable Responses patterns.
    const sendGrid = SendGridClient.createNull({ error: "my email error" });
    const auditTable = EmailVerificationAuditTable.createNull({ error: "my database error" });
    const log = ApplicationLog.createNull();

    // Then we track their output. This is the OutputTracker pattern.
    const sendGridTracker = sendGrid.trackSends();
    const auditTableTracker = auditTable.trackInserts();
    const logTracker = log.trackOutput();

    // Then we instantiate the code under test. This uses normal dependency injection.
    const controller = new VerificationEmailController(sendGrid, auditTable, log);

    // Then we call postAsync(). I'm going to provide realistic code, but not explain it, 
    // because it's not relevant to this example. Normally this would be hidden behind a
    // helper function. (See the "Signature Shielding" pattern.)
    const request = HttpRequest.createNull({ body: JSON.stringify({ email: "my_email" }) });
    await controller.postAsync(request);

    // Now we assert that the controller did what it was supposed to.

    // First, we'll assert that we tried to send two emails.
    assert.deepEqual(sendGridTracker.data, [{
      to: "my_email",
      subject: EMAIL_SUBJECT,
      body: EMAIL_BODY,
    }, {
      to: "my_email",
      subject: EMAIL_SUBJECT,
      body: EMAIL_BODY,
    }]);

    // Then we'll assert that we tried to insert two audit log entries.
    assert.deepEqual(auditTableTracker.data, [{
      recipient: "my_email",
      result: EmailVerificationAuditTable.STATUS.EMAIL_FAILED,
      emailError: "my email error",
    }, {
      recipient: "my_email",
      result: EmailVerificationAuditTable.STATUS.EMAIL_FAILED,
      emailError: "my email error",
    }]);

    // And finally, we'll assert that we logged an alert.
    assert.deepEqual(logTracker.data, [{
      alert: "FATAL",
      code: "L668",
      message: "Email verification failure",
      recipient: "my_email",
      sendGridError: "my email error",
      auditLogError: "my database error",
    }]);
  });

There ya go. Entirely possible, not difficult, and (if I do say so myself), quite a clean and readable test.

valenterry · on Jan 9, 2023

Thank you for taking the timing and writing this up! I appreciate it a lot and that's why I come back to hackernews! :)

Now, your test works and I think I have to apologize in that I should have understood your approach better and write my answer accordingly. The relevant part of my previous answer:

> You WILL end up using a form of "mocking", for example passing the functions B and C as arguments to A and then, under test, don't really pass B and C but different functions that allow you to make assertions in test. That is still mocking.

So my point here is that, yes, you are passing functions into the new VerificationEmailController and the ones you pass in are not the same that are being run in production. This is what I call a mock: you replace a dependency that runs in production with one that runs only in the test.

That's not to say that your way of testing doesn't work. It's just that it comes with the same conceptual issues (but also benefits) that mocks come with.

In particular, 1) if we "misconfigure" the function in our actual production code (i.e. pass the wrong arguments) this won't be covered by the test.

Also, 2) we will reimplement certain logic in tests that are necessary to check the actions. Because different actions might still be valid, such as [add5, add5] or [add10] - they come to the same result, but in your assertions you'll need to handle that knowledge, without checking the state, because the state might live in an external system.

And 3) Forcing dependencies to be explicit (i.e. function parameters) is neither good nor bad per se, but sometimes it's nicer to have them encapsulated and in this case both classical mocking and your approach stop working.

Therefore when it comes to me, I see both classical mocks and your approach as conceptually equal and therefore would call your appraoch mocking too. That's what I wanted to say. I hope that gives you some insight - or maybe you disagree with my 3 points above, then I would be curious why.

djur · on Jan 8, 2023

That `run` function looks to me like setting up test doubles. What makes `stderr` in this code different from a spy?

jdlshore · on Jan 8, 2023

Not being a spy. :-) It's an array that's populated by an event listener.

CommandLine is the actual production code that writes to stdout and stderr (and reads command-line arguments). CommandLine.createNull() creates an instance of CommandLine that's "turned off" and doesn't actually write to stdout or stderr. CommandLine.trackStderr() returns a reference to an array that is updated whenever something is written to stderr (or not, in the case of a nulled CommandLine).

I'm off to bed, but I'm happy to answer further questions in the morning. For free, even.

jdlshore · on Jan 8, 2023

Looks like we've reached max depth, but one last response for @ithkuil:

> Another case where having real production code have parts of it that can be turned off is trunk based development leveraging feature flags.

I've used Nullables to implement "dry run" capability in a command-line tool that did git stuff. Super clean—when I got the --dry-run flag, I just called Repo.createNull() rather than Repo.create().

ithkuil · on Jan 8, 2023

Another case where having real production code have parts of it that can be turned off is trunk based development leveraging feature flags.

seveibar · on Jan 8, 2023

I love outside-in testing, where tests are essentially any bug reported by a customer with fixturing of the db and api calls to an instance of the application and all dependencies are faked or doubled. At Seam, we use fakes for the 20+ IoT integrations we support. The fakes/doubles are a significant amount of work but 10x maintainability and imo make development more fun because you have a reference system and a lot of freedom in fixturing and testing. I use unit tests sparingly, where the combinatorial explosion is prohibitive for standing up instances or the logic has a lot of careful edge cases.

We don’t really have performance issues because outside-in testing is horizontally scalable, at Seam our ci may have 16 or so parallel test runners/PR. I saw this pattern of horizontally scaling tests being used on the NextJS repo as well, they would fixture the filesystem for a NextJS project then allow you to interact with the running application, and IIRC they had thousands of tests like this.

I think it simplifies logic to couple the logic to the database + http calls directly rather than introduce layers of infrastructure abstraction. It can vary depending on application, but especially if you’re using query builders I doubt it’s worth it to create the infrastructure layer.

Edit: In relation to the article, I’m mostly disagreeing with “keeping tests narrow” and the infrastructure layer, but in agreement with many other concepts such as the author’s Nullable double concept, which simplifies creating doubles.

hinkley · on Jan 8, 2023

Pure functions are great for testing. I keep iterating back to mostly functional cores.

pharmakom · on Jan 8, 2023

Defunctionalisation is the most powerful design pattern I have found for business backend programming.

zjdev · on Jan 9, 2023

What does this term mean? Google is telling me something that relates to higher order functions but I don't think that is what you are referring to here.

pharmakom · on Jan 11, 2023

https://news.ycombinator.com/item?id=22525038

dang · on Jan 8, 2023

Testing Without Mocks: A Pattern Language - https://news.ycombinator.com/item?id=16943876 - April 2018 (12 comments)

DotaFan · on Jan 8, 2023

Year ago we switched from testing with Mocks to testing with Doubles/Fakes, and it sped up our testing quite significantly, we write, maintain and run tests faster.

u801e · on Jan 8, 2023

I typically write tests and code in a way that I test the functional core of the code without testing the imperative shell part of it (at least for the unit tests). But we also test our code by provisioning a local VM that not only deploys the code as it would be deployed on a production instance; it also installs and configures dependencies for the code.

For example, installing a DB store and populating it with test data. Our integration/functional tests will then run queries against the code and check whether the DB store was updated as expected, we got the appropriate API response from the call to it, and whether we get expected output from the logs. This way, we're not using mocks that may not be updated depending on the state of the production environment. We're using the same code that we use to deploy services in production, but just putting anything on the same VM rather than having them on different machines accessible through a separate endpoint.

japanman425 · on Jan 8, 2023

So e2e tests then?

SeriousM · on Jan 8, 2023

The blog post is quite interesting to read! One tradeoff I found is that it doesn't work with dependency injection. I use c# for about 20 years and DI is the best, IMHO, that was adapted my Microsoft not too long ago.

When we're writing code against external systems we define wrappers. During test startup we replace the real implementation DI registration with a Moq (usually strict mode) or a concrete testable implementation. This way we get the nullable-instance into the dependency graph. This is certainly not the fastest way (measured against thousandsof tests), but it works for the team very well.

jdlshore · on Jan 8, 2023

I'm not sure what you're getting at. Look at the examples. They all use dependency injection.

Maybe you meant dependency injection frameworks? That's not true either; it works exactly the same way as any other approach: the DI framework injects production objects, and the tests inject their own test objects.

SeriousM · on Jan 8, 2023

Thank you for your comment, maybe I wasn't clear with my statement.

I referred to dependency inversion (and therefore also frameworks of it). When using DI you have no possibility to define ".asNullable" as you like from case to case, instead a framework is deciding for you. By using DI during tests it's therefore impossible to use the advertised "nullable" pattern as described. Looking at the examples all of the code does not use dependency inversion.

https://en.m.wikipedia.org/wiki/Dependency_inversion_princip...

jdlshore · on Jan 9, 2023

I know what the dependency inversion principle is. I think there's some fundamental misunderstanding here, because dependency inversion is a design pattern, not a framework, and there's nothing about it that prevents you from using Nullables in your tests.

Regardless of whether you're using dependency inversion, or dependency injection, your class under test can be programmed with the following constructor:

  class MyClass {

    private MyDependency myDependency;

    MyClass(MyDependency myDependency) {
      this.myDependency = myDependency;
    }

    // rest of code
  }

This is the fundamental dependency injection pattern. (Well, there's other ways to do it, but this is most common.) No matter what DI framework you're using, you probably have constructors that look like this. If you do, you can use Nullables. And if you don't, all you need to do is add them.

It's possible that your DI framework is generating these constructors for you automatically, in which case (1) ewwww, and (2) if you dig into your framework's documentation, there's probably a way to make it work with Nullables.

SeriousM · on Jan 9, 2023

DI is a pattern and there are frameworks that help you issuing them - no misunderstanding here.

I just wanted to point out that using DI and a framework supporting you using it, you are not in control what you get as dependency. Building the object graph yourself during test setup (and still respecting the rules of DI) will allow you that yet you have a lot more labor to do. In a modern language like C# with it's eco system it's hard and somethimes even impossible to avoid the DI feamework that connects all the services, therefore I was showing a way to get the same effect as the nullable pattern yet using a DI Framework during test setup.

zjdev · on Jan 10, 2023

> I just wanted to point out that using DI and a framework supporting you using it, you are not in control what you get as dependency

Not even remotely true. Why are you under the impression that you aren't in control of what dependencies get created and form your object graph? Either you've got a misapprehension or you've got a terrible DI framework

viraptor · on Jan 8, 2023

I'm not sure why this would be in the goals / positives.

> Tools that automatically remove busywork, such as dependency-injection frameworks and auto-mocking frameworks, are not required.

The examples are basically doing manual DI. You don't need more in this small case case, but if you're actually using specific implementations for test, why not help yourself by reducing that busywork?

You can always fall back to manual parameter where needed even if you do use DI, so using one is typically not restricting you in any way.

ddyevf635372 · on Jan 8, 2023

Are you a mockist or a classicist? ;)

A few useful article in this topic:

- https://martinfowler.com/articles/mocksArentStubs.html

- http://www.captaindebug.com/2011/01/are-you-mockist-or-class...

revskill · on Jan 8, 2023

Test to me is a tool for debugging. Mock is at least useless to me. Use real mock instead by introduce real dependency, but with local effects.

charles_f · on Jan 8, 2023

> Their tests are reliable and fast, but they tend to “lock in” implementation, making refactoring difficult

Try and make a refactoring for something that has no unit tests, or upgrade a db for tests that directly validate the values in the db. I'm not sure I get where that claim that mocks lock-in implementation is coming from, if anything they make it less locked in.

elboru · on Jan 8, 2023

I think one of the biggest issues is the definitions we use.

Some developers would define mock in the wide sense, as any kind of double to replace a dependency. Other developers have a narrower definition, as doubles that verify interactions with the dependency (interaction testing https://testing.googleblog.com/2013/03/testing-on-toilet-tes...).

In this case, the author mentions mocks and spies, therefore he's referring to "interaction testing". In other words, tests that verify that the dependency was called in the "right" way. If you own those dependencies, and later you refactor them you'll also need to refactor any test that used "interaction testing". Refactoring becomes a tedious and sometimes difficult task.

If by mock, you simply mean code (written by you or by a "Mock" library) that replaces a dependency to get the right data inside the SUT for the test, but that it's not attached to the dependency implementation, then most of the issues are gone.

Ygg2 · on Jan 8, 2023

I think the big issue is. Are you writing a lib or an app?

In library you don't need to mock. Library is supposed to be agnostic.

In app there are many layers. You kinda need to mock them all, otherwise your app can't be tested.

And then you have the problem that minor changes turn your tests red.

So mocking is most useful in the place it has almost no impact.

https://youtu.be/VDfX44fZoMc

Ygg2 · on Jan 8, 2023

Clarification of link: one solutions to unit tests is to write even more unit test, with some contract tests, etc.

Another approach is to treat integration test as a unit test.

newbieuser · on Jan 8, 2023

It's an interesting idea to write the app based on the tests instead of writing the test by the app. I don't think it will be very healthy for the overall application, but it's worth a try. Testing with real dependencies without using mock seems better overall, but this brings with it quite a few problems.

theteapot · on Jan 10, 2023

Only read the intro. I can't read any more, because I'm super confused by this piece of writing. But you asked for feedback, so (limited to intro):

> Folks in the know use mocks and spies (I say “mocks” for short in this article) to write isolated interaction-based tests. Their tests are reliable and fast, but they tend to “lock in” implementation ...

Depends on how you implement it. Mock almost by definition does not lock in implementation as it is a reimplementation of an interface. Spies yeah OK, but depends on how you use. You should probably distinguish between Mock and Spy or state why you conclude they lock in implementation.

> Bad tests are a sign of bad design, so some people use techniques such as Hexagonal Architecture and functional core, imperative shell to separate logic from infrastructure.

You could have said: "Bad tests are a sign of bad design, so some people improve there design.". Genius move. Hexagonal Architecture has nothing to do with it, it's wild conjecture that Hexagonal Architecture is going to improve anything.

> (Infrastructure is code that involves external systems or state.)

No, it isn't. Maybe you confusing Infrastructure-as-code with Infrastructure-is-code.

> It fixes the problem... for logic.

What problem? If we trust you, we hypothetically improved the design and architecture. Test can still be bad. I would drop this whole "Bad tests are a sign of bad design ..." paragraph. I mean why the heck are we talking about Hexagonal Architecture here.

> The patterns combine sociable, state-based tests with a novel infrastructure technique called “Nullables.” At first glance, Nullables look like test doubles, but they're actually production code with an “off” switch.

Man, just because you comingle you mock with your actual code - which seems very anti SOLID - doesn't make the thing not a test-double. It's still a test double, just a weirdly coupled one.

> The rest of the article goes into detail. Don’t be intimidated by its size. It’s broken up into bite-sized pieces with lots of code examples.

The rest of the article is like a book that has nothing to do with the title of the article? You should explain what all that is or put it somewhere else.

Hope that helps.

mgaunard · on Jan 8, 2023

Just a lot of weird words to describe plain dependency injection techniques.

BoiledCabbage · on Jan 8, 2023

Do you have names for these various techniques? Are they defined somewhere?

mgaunard · on Jan 8, 2023

It's just called dependency injection.

Somehow he invents nonsensical buzzwords to describe every line of code.

Terretta · on Jan 8, 2023

buzzword: stock phrases that have become nonsense through endless repetition

So if he invented it, it isn't stock, and if it's annoying for its novelty it's not endlessly repeated in the industry.

Probably you want the word lingo, not buzzword:

lingo: [noun] strange or incomprehensible language or speech: such as. a foreign language. the special vocabulary of a particular field of interest. language characteristic of an individual.

BoiledCabbage · on Jan 8, 2023

So it sounds like there is a sub-classification of a collection of techniques, you don't have names for them and they aren't defined anywhere.

Sounds like someone should name them and write up their definitions somewhere.

pugworthy · on Jan 8, 2023

Someone asked the development team how to mock Scala a few years back. The answers were amusing given the "Java vs Scala" wars going on with the team at that time.