Once you’ve replaced each infrastructure dependency with some form of test doubl...

flippinburgers · on Jan 8, 2023

I write unit tests selectively to capture unwritten rules about how some functions need to act. It is a better form of documentation. Otherwise, I lean on what you are describing: actually testing a system against the outer "api layer" that is interacted with by external users be it a UI or an actual api.

I avoid mocking like the plague unless it truly is an external system I have no control over. Then I might mock as a form of documentation about expectations.

I was involved in a project recently that uses the "clean architecture" approach where every layer was abstracted and everything was mocked. Honestly it was a major turn off on two levels: I deeply dislike the "clean" architecture based on this single experience alone and I found my deep dislike of heavy mocking to be rekindled.

teknopaul · on Jan 8, 2023

Been in that "everything mocked" scenario too. I try to mock as little as possible too but ensure that as much as possible can be mocked, because very often that is path of least resistance for a specific test.

I've seen code with so much mocked that many unit tests failed to test anything other than behaviour of mocks.

Not sure I'm comfortable with test code and switches being deployed as an alternative. I find supporting different types of test to be a benefit. Seems to me if you have test infrastructure in the code that will limit options. I have seen code that activly prevented mocks that was impossible to test.

u801e · on Jan 8, 2023

> I've seen code with so much mocked that many unit tests failed to test anything other than behaviour of mocks.

One example of this was a test that passed because of the expected behavior as determined by a mock, but the code failed in the staging environment because the underlying library changed its method signature to take some parameters in a different order.

To me, this basically was a signal that that particular test and tests like it were essentially useless in determining whether the code was really in a working state.

flippinburgers · on Jan 8, 2023

This is the major achilles heal of mocking that I find to be most bothersome. If at all possible I try to have at least a happy path subset of real tests hitting real 3rd party apis if I possibly can.

jdlshore · on Jan 8, 2023

Author here. What you’re describing is basically the same thing as sociable tests at the top of your system, which is part of what I'm recommending. You could stop there if you wanted.

I like having narrow tests rather than broad tests, though, because it allows me to keep my tests localized. For example, in the simple example app, I have a Rot13 module with a bunch of edge cases. How does it handle upper-case letters, lower-case letters, numbers, symbols, etc.

I could test that with broad tests, but doing that in a large system means that you end up with a combinatorial explosion of tests, or (more likely) hidden gaps in your tests. I find it easier to test my code thoroughly with narrow tests, which allows me to test higher-level code with the security of knowing that lower-level code Just Works(tm) and doesn't need further testing.

I haven't found narrow sociable tests to be brittle in the face of refactorings, at least not when using the patterns I describe. That's a big motivation for the pattern language—to make refactoring easy, even big architectural refactorings—and that's turned out to be true in practice.

vikinghckr · on Jan 8, 2023

Hi, I really enjoyed the piece! How much to test at each layer of abstraction has always been something that's bugged me. This article provides a nice strategy.

I still have questions about a few scenarios though. Let's say, I have two classes like this:

  class KeyValueStore {
    int64 Get(Key key);
  };

  class CachingKeyValueStore {
    CachingKeyValueStore(const KeyValueStore& wrapped);

    int64 Get(Key key);
  }

Here, the `CachingKeyValueStore` class is a wrapper around the `KeyValueStore` class, and its purpose is to simply maintain an in-memory cache (instead of RPC calls or Database calls). How would you unit test the `CachingKeyValueStore` class? So far, using Mocks have been my only strategy, because behavior-wise, both classes have the exact same outputs for the same inputs.

jdlshore · on Jan 8, 2023

Thanks! I'm glad you enjoyed it.

The behavioral difference between Caching KeyValue Store and KeyValueStore is that it doesn't fetch from the underlying data source when called a second time. So that's what I would test. Given the signature you provided, I would either instrument KeyValueStore so I could tell if it had been run twice, or (if KeyValueStore has multiple implementations) I might create a ExampleKeyValueStore in my tests that had that instrumentation.

I solved this problem a bit differently in my code, though. Instead of making a CachingKeyValueStore, I made a MemoryCache. It has a get() method that takes a key and a function. The first time it's called, it runs the function; the second time, it returns the stored result. Here's one of my actual tests for that code (JavaScript):

  it("caches result of function, so it's only run once", async function() {
    const cache = createIndependentCache();

    let runCount = 0;
    const getFn = () => runCount++;

    await cache.getAsync(NULL_LOG, "key", getFn);
    await cache.getAsync(NULL_LOG, "key", getFn);

    assert.equal(runCount, 1);
  });

And in case you’re curious, here’s the implementation, excluding a bunch of instrumentation and logging:

  async getAsync(log, key, getFnAsync) {
    if (this._cache[key] === undefined) {
      this._cache[key] = getFnAsync();
    }

    try {
      return await this._cache[key];
    }
    catch (err) {
      delete this._cache[key];   // don't cache errors
      throw err;
    }
  }

vikinghckr · on Jan 8, 2023

Thanks! I really like the idea of the instrumented ExampleKeyValueStore class which can be defined like this:

  class ExampleKeyValueStore {
    ExampleKeyValueStore(KeyValueStore wrapped);
    int hit_count();
  };

That means I can use the real KeyValueStore as a dependency while still testing the caching behavior. Cool!

joshuamorton · on Jan 8, 2023

You probably should have some additional tests around the CachingKeyValueStore related to eviction and size. Maybe this doesn't matter very often, but you should at least test the behavior of your Caching store if when the cache size is only, e.g. 1 item.

You can either do this by passing in a mock KVStore, or by passing in a normal/fake KV store in which you can update the underlying data. So for example:

    data = ...
    kv = new KVStore(data)
    cache = new CachingKVStore(kv, cache_size=1)
    v = cache.get(k1)
    data.update(k1, new_value)
    v2 = cache.get(k1)
    assert v2 == v1  // Cached!
    v3 = cache.get(k2)  // evict k1
    v4 = cache.get(k1)
    assert v4 != v1
    assert v4 == new_value  // Gets the new value

Otherwise, yeah for some cases like this you can just inherit from the KVStore test class and run all the same tests with a slightly different set up method to use a caching store instead.

vikinghckr · on Jan 8, 2023

Thanks! Injecting the data itself is an interesting approach!

pharmakom · on Jan 8, 2023

I would use a mock for the wrapped instance that generates a strong random value on each call to “Get”.

Then I would do some basic unit tests and maybe property based testing given that the only way for the same value to appear on subsequent “Get” calls is for caching to have occurred.

klabb3 · on Jan 8, 2023

> I could test that with broad tests, but doing that in a large system means that you end up with a combinatorial explosion of tests, or (more likely) hidden gaps in your tests.

Extremely well put. This maps exactly to my experience when moving from unit tests to the layer above. Test are taught exactly like “draw the rest of the fucking owl” meme, and I’ve never seen them actually hold up to the nice and tidy “hello word” tests that you see in demos. You’re actually trying to tackle the reality of gritty and ugly software systems, and it’s very refreshing to see, because it may just work. It’s gonna take a while to see if it holds up, but so far I am really excited.

thom · on Jan 8, 2023

For logic heavy parts of the system then yes, narrow tests are very useful. Often these sorts of components are amenable to property based testing which definitely becomes harder the broader the surface. I have generally been lucky that these bits of code are either rare, or they become infrastructure dependencies quite quickly in which case you’re broadly testing just categories of responses, not all edge cases.

But I still think a general rule to live by is to maximise the ratio of business value covered by tests to the amount of program text referenced by tests. Too small a ratio and you’re gaining little confidence but exposing yourself to constant toil just to keep up with changing implementations, even when observable system behaviour doesn’t change much.

I’m unsure if top level sociable tests get me where I want to be. It seems like they stop one layer down, so I’m still concerned I don’t capture actual end to end use cases anywhere in the codebase. I’m not personally comfortable with that (from a testing point of view but also because I’m tired of working on codebases that don’t clearly reveal what they do and why), but that outlook’s been formed by my own particular failures and shortcomings over the years so it might be quite subjective.

drewcoo · on Jan 8, 2023

> Once you’ve replaced each infrastructure dependency with some form of test double (whatever you call it) that captures state, broad tests are no longer slow nor do they fail randomly

From my experience, broad tests are slow and brittle because they involve an entire environment. That requires manipulation or knowledge of many repos, additional maintenance of test data, oftentimes gambling on whether someone changed the environment without telling you, and more. Under the best circumstances, tests have to traverse several layers of code and interact with side-effects, all of which usually lack observability.

I love shops that have microservices that own their data stores. Unfortunately, there are too many monoliths and what I call "macroliths" (tightly coupled monoliths built out of services running in k8s). If everything is tightly coupled around a store, I don't think the article solves the problem either. Or makes testing any more rigorous or complete, for that matter.

elboru · on Jan 8, 2023

There's a middle tier, broad tests in the sense of not mocking local dependencies but only external (slow) ones.

Many developers mock every dependency in a SUT, even if they own the dependency. I like the idea of broad tests that run-in memory (no I/O interaction).

I think Ian Cooper does a great work explaining it in this talk (he talks about TDD, but I think the main idea applies even if you write tests after code): https://www.youtube.com/watch?v=EZ05e7EMOLM&ab_channel=DevTe...

thom · on Jan 8, 2023

This is more what I was suggesting, although I’ve not actually found in-proc test doubles talking via localhost to be slow or brittle so I don’t mind mocking on either side of that divide. You’re going to write those mocks and tests anyway to check how you’re interacting with external systems in narrow tests, so you can choose how deep your broad tests go.