Testing Without Mocks: A Pattern Language (2018)

jdlshore · on March 7, 2022

Hey all, author here. I'm a bit late to the party but happy to answer any questions you have.

The main question seems to be: what's the difference between null implementations and mocks?

There's a practical answer to that question and a technical answer.

Practically speaking, the reason to use nullable infrastructure wrappers is that they enable sociable tests. A flaw of mock-based testing is that you only check that your system under test (SUT) calls its dependencies, but not that your code's expectations of its dependencies are correct. Sociable tests check that the SUT uses its dependencies correctly. This is valuable when refactoring, because it's easy to inadvertently introduce changes to behavior that break callers. Mock-based tests ("solitary tests") can't detect those errors.

Technically speaking, the nullable infrastructure wrappers convert your tests from interaction-based ("did I call the right methods") to state-based ("did my system return the correct results or make the appropriate state changes"). They do this by adding state inspection capabilities to production code, allowing you to assert on the state of the production code rather than asserting which methods were called. They also implement a tiny hidden stub to enable you to "turn off" interactions with the outside world.

My approach and mocks/spies are superficially similar, in that they both make use of test doubles (stubs, mocks, and spies are all test doubles) but they lead to significantly different testing approaches. Mocks/spies lead to interaction-based solitary tests, and nullable infrastructure leads to state-based sociable tests. I also find the tests are easier to write and read, and they're orders of magnitude faster¹ than using a mocking framework, too.

¹In a head-to-head comparison, nullable infrastructure ran 1,075,268 tests/sec, testdouble.js ran 12,210 tests/sec, and sinon.js ran 2,793 tests/sec. https://github.com/jamesshore/livestream/blob/2020-05-26-end...

etamponi · on March 7, 2022

Thank you for your clarification (and for the article).

Could you elaborate a bit more? I see the point of sociable tests, but I don't see how nullable infrastructure does not break the pattern itself. Or, actually, I only see one way in which it does not break sociable tests: if the null implementation is tested against the _real_ implementation, guaranteeing that it does not drift away. At this point, using the null version of a wrapper in another test is guaranteed to test the correct interaction with the infrastructure.

Is this it, or is there something else that I am missing?

jdlshore · on March 7, 2022

I don't fully understand your question (I don't know what you mean by breaking the pattern), but I'll give it a shot.

The short version is that the "Null" uses your real production code. The mock doesn't.

Let's say you have some login page that uses Auth0. You have a controller for the page and an infrastructure wrapper (aka gateway) for Auth0. You want to test that your login controller logic is implemented correctly.

When you test your controller with a mock, your code goes through this path:

  Test --> Controller --> Mock

But your real code does this:

  Controller --> Auth0 Wrapper -/\/\/ (HTTPS request) \/\/\-> Auth0 Service

So you're vulnerable. If you ever refactor your Auth0 wrapper so it behaves differently than your controller expects, none of your tests will catch it, even if you have good tests of both Controller and Auth0 Wrapper. You can prevent this vulnerability by writing an integration test:

  Test --> Controller --> Auth0 Wrapper -/\/\/-> Auth0 Service

But those are slow and brittle, and a lot of work to set up. Your test of Controller has to know all sorts of details about Auth0 Service, which requires you to know about the implementation details of Auth0 Wrapper. But you're just trying to test Controller, and you want Auth0 Wrapper to hide those details from you.

So instead, if you implement nullable infrastructure, your Auth0 Wrapper would work like this:

                  +-/\/\/-> Auth0 Service
                 /
  Auth0 Wrapper +
                 \
                  +--> Null Auth0 Service

At runtime, you can choose to call the `create()` factory method, and get an Auth0 Wrapper that talks to Auth0 Service, or you can call the `createNull()` factory method, and get an Auth0 Wrapper that doesn't talk to the Auth0 Service. Either way, the service is encapsulated and you don't have to know any of the details about how Auth0 Service works.

So, in your test of Controller, you can inject the `createNull()` version of Auth0 Wrapper, and now your test looks like this:

  Test --> Controller --> Auth0 Wrapper --> Null Auth0 Service

You get all the benefits of an integration test without the costs of an integration test. You don't have to know the internals of how Auth0 Wrapper is implemented, you don't have to set up Auth0 Service, and you don't make a network call.

So, in the future, if Auth0 Wrapper changes in a way that breaks Controller, Test fails. If you were using a mock, it wouldn't.

I have a bunch of coding livestreams with practical examples at https://www.jamesshore.com/v2/projects/lunch-and-learn. Check out episode 4 for the basics, and episodes 5-8 (and beyond) for an example of implementing a real world microservice architecture using these ideas.

etamponi · on March 8, 2022

Thank you! I think I've got it now. So using the createNull version is very very similar to using a spy server, with the difference that with the null version the request is not sent at all.

If I can, may I ask you another question? How do you inject the createNull version in a test that does not have the architecture wrapper as a direct dependency? Do you manually unravel the dependencies? For example:

Test -> UpperController -> Controller -> ArchitectureWrapper

How do you test UpperController with a createNull version of ArchitectureWrapper?

Or is it necessary to always have the architecture as a direct dependency of the SUT? (As mandated by the A-frame architecture)

Thanks

jdlshore · on March 8, 2022

> similar to using a spy server

I hadn't thought of it that way, but I think that's exactly correct.

> How do you inject the createNull version in a test that does not have the [infrastructure] wrapper as a direct dependency?

The short answer is that you create a chain of nullable objects.

Infrastructure "pollutes" its callers, in that having intrastructure in the dependency chain exposes you to the downsides of infrastructure (external state, network calls, unreliability, etc.). So I consider deep dependencies on infrastructure a design smell. Hence A-Frame Architecture.

That said, design is all about tradeoffs, and sometimes a deep dependency on infrastructure is better than the alternatives. Your example of UpperController and Controller is something I'd prefer to avoid, but it's normal for me to have "high-level" infrastructure that depends on "low-level" infrastructure.

For example, let's say we implement a logging wrapper. Under the covers, it depends on CommandLine, a wrapper for stdout. When used by Controller, the chain looks like this:

  Controller --> Log --> CommandLine --> process.stdout

CommandLine exposes trackStdout()¹, which allows you to record what was written to stdout, and createNull(), which allows you to create a CommandLine that doesn't actually write to stdout.

¹trackStdout() is a more sophisticated alternative to the "Send Events" and "Send State" (getLastOutput) patterns mentioned in the 2018 article.

But Controller isn't aware of CommandLine. Controller is only aware of Log. Log's use of CommandLine is an encapsulated implementation detail that's hidden from the rest of the world.

So Log also exposes methods to help you test. It has trackOutput(), which allows you to record what was written to the log, and createNull(), which allows you to create a Log that doesn't actually write to the log.

Importantly, here, trackOutput() isn't just calling CommandLine.trackOutput(). It's providing a higher-level abstraction that knows how to deal with structured logs, timestamps, and stack traces.

So in the test of Controller, your code looks like this (JavaScript):

  it("logs stuff", () => {
    const log = Log.createNull();
    const logOutput = log.trackOutput();
    const controller = new Controller(log);

    controller.doStuff();
    assert.deepEqual(logOutput, [
      { message: "my expected log line 1", detail: "some detail" },
      { message: "my expected log line 2", detail: "some other detail" },
    ]);
  });

This isn't a lot of work for Log, because it can use the abstractions provided by CommandLine. In particular, its factory methods look like this:

  static create() {
    return new Log(CommandLine.create());
  }

  static createNull() {
    return new Log(CommandLine.createNull());
  }

I have a video for this example at [2] and the source code is at [3].

[2] https://www.jamesshore.com/v2/projects/lunch-and-learn/testa...

[3] https://github.com/jamesshore/livestream/blob/2020-08-11-end...

etamponi · on March 8, 2022

Clear, thank you! In the meantime, I've started watching the videos and they're excellent. I've found answers to many other questions there.

Thanks for putting all of this together!

loevborg · on March 5, 2022

Enjoyed this much more than I thought I would

• Shows (almost) real-world examples to demonstrate the patterns

• Uses JavaScript, which is notoriously hard to test because of complex UI frameworks and difficult-to-mock APIs

• Set of patterns that refer to one another, providing mutual support

Would love to read more—any other resource along similar lines appreciated

omegalulw · on March 5, 2022

I found this to have many questionable claims for example:

> Ensure that all Logic classes can be constructed without providing any parameters (and without using a DI framework). In practice, this means that most objects instantiate their dependencies in their constructor by default, although they may also accept them as optional parameters.

When you do the latter, it kind of defeats the purpose of dependency injection and what you are testing won't be what you use in prod. And when you don't do this, you have to set up dependencies properly - at that point it's almost becomes an integration test. IMO the right thing here is to mock dependencies in unit tests so you can properly test the class in question.

Fundamentally, it seems like the author hasn't worked on or considered complex distributed systems where stuff like dependency management, sequencing (back and forth interactions), etc really matter - you can't test those properly in unit tests, you need integration tests.

Unit tests and integration tests complement each other, you can't reasonably test and update complex systems without both.

eyelidlessness · on March 6, 2022

> Uses JavaScript, which is notoriously hard to test because of complex UI frameworks and difficult-to-mock APIs

It really depends. I’ve found testing JS/TS to range from “damn this is not worth automating” to “this is how it should be”.

Obviously UI frameworks may fall in the former category, but IME this is a place where JSX really shines: it’s an unopinionated DSL, perfectly fit to stub however you see fit. Much more complex is testing DOM and event loop behavior.

Server-side (non-presentational) JS/TS may be difficult to test, but that’s really more an accident of history than anything. It almost all depends on the Express API or something like it, and it’s damn near impossible to reason about at a unit level or as a statically analyzable system. But it doesn’t have to be that way, and I’ve developed complex systems without this problem, to the point unit tests provide enough confidence that few if any integration tests were of any value. Which in the article’s parlance means nearly all tests other than internal library code were testing pure logic functions or infrastructure functions. (I’m still sad I wasn’t able to open source this library code but a new implementation is on my side project list.)

> Set of patterns that refer to one another, providing mutual support

I first skimmed the article then re-read it, because I found this challenging as a reader trying to keep track of my cross-reference stack. It was better on a second pass, but I do think there’s an opportunity to inline some linked information at the expense of redundancy to minimize that, and to treat this format as more of a reference backing the article.

loevborg · on March 5, 2022

I found that James has recorded a series of 22 TDD sessions that demonstrate the techniques outlined in the blog post in JS: https://www.youtube.com/watch?v=nlGSDUuK7C4&list=PLD-LK0HSm0...

karlmdavis · on March 5, 2022

I’m really not clear what’s going on with the null implementations… aren’t those just very boring mocks?

eyelidlessness · on March 6, 2022

That’s my impression as well. Almost always when people advocate narrow tests without mocks they end up using something that’s not called a mock but ultimately isn’t distinguishable from a mock.

Which I don’t think necessarily invalidates the article (which I’ve only skimmed so far, but already strikes me as having a lot going for it), just some terminology used. It’s perfectly reasonable, regardless of terminology, to say: here are some techniques that encourage testability and high confidence in those tests. And it appears to me there’s a lot of that in the article. So with that said, I’m gonna go give it a more thorough read :)

jdlshore · on March 7, 2022

I answered this question here: https://news.ycombinator.com/item?id=30591607

etamponi · on March 6, 2022

(Poster here)

Agreed. Perhaps what differentiate those from "normal" mocks is:

1) They only apply to "infrastructure" (so databases, external services etc). Basically the article says: "It's okay to use mocks sparingly, to avoid costly and flaky tests involving external services".

2) They are defined along with the production code, and _tested accordingly_. Which I assume means that you test the "null implementations"/"infrastructure mocks" by making sure they are always aligned with the real interactions. This avoids the danger of "normal mocks" to diverge from the real interactions.

What I would like to understand better, instead, is if there is a way to avoid all of this system to break once the infrastructure is two layers below the code you're testing. Because you can inject a null implementation into ApplicationClass, but what you have a UpperApplicationClass which depends on ApplicationClass? You either have to unravel the dependencies yourself to inject NullImplementation inside ApplicationClass inside UpperApplicationClass (which defies the point of the article, in a way), or you have to resort to a DI framework, which I personally abhor (I've still to find a single case in which it is a net benefit and not just a magic way of obfuscating the code).

fgeiger · on March 7, 2022

> Which I assume means that you test the "null implementations"/"infrastructure mocks" by making sure they are always aligned with the real interactions. This avoids the danger of "normal mocks" to diverge from the real interactions.

When using a typed language (TS, Java, etc) this is also the case for mocks, isn't it? I don't see a reason why the implementation of the `createNull()` factory should be less prone to go out of sync with the implementation than mocks (as long as the type signatures are checked).

xmcqdpt2 · on March 6, 2022

Same, it seems to me that the article is based on the idea that

    val app = new AppClass(new MockServer)
    assert(...)

is bad but

    val app = new AppClass(Server.createMock())
    assert(...)

is good. I for one see no difference.

tyingq · on March 5, 2022

"Test-specific production code. Some code needed for the tests is written as tested production code, particularly for infrastructure classes. It requires extra time to write and adds noise to class APIs."

I don't see this much in real world code. Is there a space/niche that does this extensively? Does it raise a bunch of arguments about anti-patterns, etc?

geewee · on March 5, 2022

I think it depends on where you draw the line. In C# it's very common to use interfaces even though you only ever expect to have a single class of the given type, just so you can stub it out in tests.

It's unfortunate, but some languages require you to do at least some re-architecting so it's easy to test your code.

BurningFrog · on March 5, 2022

WIthout reading the article, I'm on the "testability is a first class design criteria".

Ease of testing should absolutely affect all your production code design choices.

That said, there are terrible ways to do that, as always.

mosdl · on March 5, 2022

Interfaces have other advantages as well, that is an added benefit.

Implementing a No Op version of an interface for tests or a simple version that uses a map instead of db calles has a lot of benefits.

ravenstine · on March 5, 2022

I've seen it (and, ahem, have written it...), but it's a rare sort of antipattern that I think mostly happens with inexperienced developers. I haven't had to do such a thing in several years except when there was some weird framework/library limitation where I was semi-forced to write a little test-specific code. Test-specific production code usually is a crutch to work within weird situations ultimately caused by dogmatic adherence to object oriented principles where there's the potential for inaccessible state all over the place; if something is getting in the way of tests that you only have coarse-grain control over then having `if (env == "test")` acting as a glorified `jmp` instruction might result in either making something testable or at least faster to test. Experienced developers that are at least halfway decent, rather than use such a crutch, instinctively consider whether the situation they are in is caused by stupid design and correct for the issue at a higher level; if they've always managed to think this way (i.e. like an engineer) then it may never occur to them why anyone would write test-specific code other than perhaps feature flags.

omegalulw · on March 5, 2022

It's not always an antipattern. Another example would be a public API that exposes more state/info than it's users need, just for better tests. It's still kind of not the best practice, I will concede, since almost always you want to focus on testing the contract, but sometimes, exposing some extremely critical state info is ok to improve tests.

eyelidlessness · on March 6, 2022

In my personal experience I’ve been more prone to expose test specific code in implementation than my teammates, but even so:

- it’s usually to export things which are project internal (and create references where they’re magic numbers etc otherwise) specifically to decouple tests

- otherwise it’s been basically some lighter weight version of the null wrapper

And no one has complained. Perhaps they should but I think it hasn’t been an issue because my devotion to automated testing has been more thorough and thoughtful than many of those who’d complain, and they’re happy to benefit from the safety that provides.

Honestly I think test and implementation code should be more tightly coupled not less. At least in terms of being able to reference things that are clearly project-/package-private. I know this would bother a lot of people depending on discipline, but it’s pretty much exactly what people do in many Rust projects and seems pretty successful.

bluGill · on March 5, 2022

It is a lot eaiser to make an interface and throw a mocking framework at it. I've been thinking of similar ideas for a while, but implementing them is hard so I never tried it.

ed-209 · on March 5, 2022

Spoiler - expososing test data from the implementation: https://github.com/jamesshore/testing-without-mocks-example/...

eyelidlessness · on March 6, 2022

This and other cases of test-specific implementation code is also specifically called out under tradeoffs in the article FWIW.

bcbrown · on March 5, 2022

This is almost exactly my approach to testing. I'm going to have to save this for sharing with teammates.

chezzwizz · on March 6, 2022

I felt too compelled to stick with existing design pattern theories and maintain a sort of persistent stubbornness with traditional xUnit style tests and TDD rather than follow the rabbit on this article.