Hacker News new | past | comments | ask | show | jobs | submit login

There's actually some good ideas in here. I wish it wasn't written in such a buzzword-heavy, consultant-y style. When advice includes so many Novel Proper Nouns I find myself checking my wallet.

My main quibbles are that these "Nullables" are still test doubles, just implemented in a way with different tradeoffs than typical mocking frameworks. Also, spies are still helpful when you're checking for side effects. I'm not really seeing any alternative offered here for testing "does this code send 0, 1, or 500 emails".




otoh, I worked with a guy who was so mock heavy that he actually wrote real mocks for hash table for each test.

I worked with another guy who mocked out so much external behavior that he couldn't figure out why his stuff never worked, not doing the necessary work to ensure that his expectations of external behavior were actually still met ( if you're going to mock out external behavior, you need to add in tests that ensures your expectations are valid).

There is a place for Mocks, but I have found the other varieties of test doubles to more useful...


If I'm writing unit tests for #some_method, I'll mock anything that method calls outside of that method (so I can check how that method will respond to error conditions from calling those other methods).

Unit tests != Integration tests, and the two must both exist otherwise I'll run into that situation where my unit tests work but the software doesn't.


But do you not find the problem is you now end up with tests that assume the existence of #some_method with the tests tightly coupled to the internal logic and flow of that method?

If you decide to refactor, every part of the system now has one or more tests that break because they mimic or clone each method 1:1 rather than testing input and output.


Perhaps, but if I refactor #some_method I only have to look in my unit tests where I describe #some_method and focus there. The integration tests should not have to be updated, since they only describe expected behavior -- unless the expected behavior must change as well.

The coupling between unit tests and the methods they test is correct, IMO, because really how else are you supposed to test what #some_method would do when it tries to call #another_method but #another_method raises various different exceptions, or returns no results, or returns successfully? If the answer is "I only test happy paths" then good luck - you're not really testing anything.


Exactly the problem the 2nd guy had


what is the place for mocks?


External systems, unexpected error cases (which by nature of being unexpected can't really be triggered in a "natural" way), very generic code (e.g. instrumentation frameworks), major architectural boundaries, ...

I agree that most people over-mock, though.


External systems dependencies are better handled with stubs and fakes than mocks, in my experience.


That really depends on the case in question and on the particulars of the codebase.

I wouldn't recommend mocking http libraries, database calls etc. - that usually leads to madness and impenetrable test setups. I would rather recommend to either use fake implementations (e.g. in-memory DB) or to only integration-test the boundaries of your interaction with the external system (that might or might not be hard to achieve).

But I do like the pattern of wrapping calls to external APIs through a layer that I control - so that, for example, I may just write mailer.sendMessage(message) - and then that mailer interface seems like a good candidate for mocking.


In my experience, they are the test double of last resort.

I used them to verify behavior, not outcome (as that's what expectations do), e.g., did I really call this method only once?


I feel that the writing style reinforced the well-thought-out consistency of the system. The information of value was all right there, so I didn't feel worried about getting a bill.


Author here. The way to test "does this code send 0, 1, or 500 emails" is the Output Tracking pattern: https://www.jamesshore.com/v2/projects/testing-without-mocks...

Thanks for the wallet!


Thanks for the link and the original article. To me, OutputTracker looks a lot like a spy, except instead of verifying that the unit under test took a particular action you're instead verifying that it logged that it took an action. That would seem to create the risk of missing cases where the events emitted by code don't match its actual behavior.


Output tracking and spies are solving the same problem, but they do it in different ways. Spies record which methods are called. Output tracking records behavior that's otherwise invisible to callers (such as inserting something into a database).

There's no risk of missing cases. The output tracking happens at the same semantic level as the rest of the code and is a binary "tracked / not tracked" type of thing. There's no behavior to match, and the code is tested anyway.

Edit: By "no behavior to match," I mean that the thing doing the behavior is the thing tracking the behavior. The tracker is driven by events you emit when you perform a behavior.


I'm saying that if I have code like this, with

   payload = prepare_payload
   if verify_payload?(payload)
     mailer.deliver(payload)
     emitter.emit(:sent_the_email)
   end
there is a risk that a later change to this code will make it so that the `mailer` and `emitter` are not guaranteed to be called together. I have seen this bug in production and I don't see how your approach catches it. I'm also not sure how I'm supposed to test for different desired values in `payload` here.


I mean, sure, if you program the output tracker incorrectly, it won't work. Not sure what else you expect. You're expected to have tests, of the output tracking code itself. They catch changes that breaks the output tracker, just like you have tests to catch any other regressions.

There's an example of that kind of test here: https://github.com/jamesshore/livestream/blob/2020-09-22-end...

Regarding testing different values, I think what you're missing is that you don't just emit an event; you emit an event with data. Typically it's whatever data you're delivering.

  emitter.emit(:sent_the_email, payload)
Then later, you assert on that data.

  assert.equal(delivered, { 'my_expected' => 'data_goes_here' })


Looking at your example, I'm still not seeing how this isn't just implementing ad hoc mocking for each component. The reason I'm interested is because the overall approach is very similar to what I've settled on over the years, other than the aversion to using labor-saving DI and mocking frameworks. I'm not sure why I should prefer to write more code (that needs to be tested itself) rather than relying on a well-tested and well-understood library.


I don’t know what else to say, man. Maybe try it for yourself so you can see how it works?

Mocks lead to solitary, interaction-based tests.

My approach leads to sociable, state-based tests.

These are polar opposite testing approaches, with different tradeoffs. I don’t care which approach you use, but saying they’re the same thing means you don’t understand it.

Other people haven’t had the same problem understanding the fundamentals that you are. You’re asking very basic questions, which makes me think you haven’t taken the time to read the article carefully. I’m happy to help, but your dismissive attitude makes me think you’re less interested in understanding the material and more interested in proving that you don’t need to understand it. Your shallow comments about Capitalization and wallets didn’t exactly endear you to me, either.

I’ve provided a lot of material online. An article with tons of details and examples. Links to additional full-fledged examples. Multiple video series. Now it’s on you to take advantage of these resources. Or not; no skin off my nose either way.


I'm sorry to have offended and frustrated you. It wasn't my intention. I hope you have a nice day (sincerely).


That stops working quickly - namely as soon as you want to test a function A that uses two other functions B and C both of which have some output that is being used.

For example, a function B that sends an email to a user through a 3rd party system and returns an indiciation whether the request to send the email was successfull, a function C that stores in the database that a notification was sent successfully and now a function A that calls B and, if it fails, repeats it a few times, then calls C and, if it fails, repeats it a few times, otherwise fails itself.

This "do X, then depending on the ouput do Y or Z and dependin on their or ..." can't be tested in the way you describe.

You WILL end up using a form of "mocking", for example passing the functions B and C as arguments to A and then, under test, don't really pass B and C but different functions that allow you to make assertions in test. That is still mocking.


There's nothing difficult about the scenario you're describing at all. I don't have example code for that specific scenario, but I do have an example of the following scenario:

A calls B, which calls an external service. B returns function D, which can be used to cancel the request. When B fails to return within five seconds, A calls D to cancel the request, then calls E to write an error message to stdout.

The test for this scenario checks that the request was made, the request was cancelled, and the error was written to stdout. You can see that test here:

https://github.com/jamesshore/livestream/blob/2020-09-22-end...


Unfortunately your example situation is not comparable. Try to come up with a test for my example that does not pass any arguments during test that would never be passed during a production run. I guarantee you, that is not possible without mocking. And I'm saying that as someone who really doesn't like mocking.


Okay, I have nothing better to do this Sunday morning. Let's play with your example. We have a function A that uses B to send an email and C to store a notification in a database. We want to test that, when A fails, it calls B a few times, then calls C a few times, then fails.

I'm not going to write a full working program, but I'll flesh out your example a bit and explain how it works. I'll use JavaScript and the patterns described in the article.

I'm going to say "A" in your example is the VerificationEmailController class. It has a postAsync() method that handles POST requests. When it receives a POST request, it sends an "verify your email" email, then writes the result to a database.

"B" in your example is SendGridClient. It has a sendEmail() method that uses SendGrid to send email. It does it by making an HTTP call to the SendGrid service.

"C" in your example is a EmailVerificationAuditTable. It has a insertEmailSent() method that inserts a "success" or "fail" record into a database table.

"Failing" in your example involves writing an alert to the application log file. It uses ApplicationLog, which has a logEmergency() method that writes a structured log with the "FATAL" log level.

To summarize, we are writing and testing VerificationEmailController. It depends on SendGridClient, EmailVerificationAuditTable, and ApplicationLog.

SendGridClient, EmailVerificationAuditTable, and ApplicationLog use the patterns in the article. Specifically, they're Nullable, they're Infrastructure Wrappers, they have Configurable Responses, and they use Output Tracking.

Got it? Okay, let's write the test. This test is really doing too much, and should be broken out into multiple separate tests, but I'm going to follow the example you provided.

  it("fails cleanly by retrying email service and database service, then logging an alert", async () => {
    // First, we set up the dependencies. This is the Nullables and Configurable Responses patterns.
    const sendGrid = SendGridClient.createNull({ error: "my email error" });
    const auditTable = EmailVerificationAuditTable.createNull({ error: "my database error" });
    const log = ApplicationLog.createNull();

    // Then we track their output. This is the OutputTracker pattern.
    const sendGridTracker = sendGrid.trackSends();
    const auditTableTracker = auditTable.trackInserts();
    const logTracker = log.trackOutput();

    // Then we instantiate the code under test. This uses normal dependency injection.
    const controller = new VerificationEmailController(sendGrid, auditTable, log);

    // Then we call postAsync(). I'm going to provide realistic code, but not explain it, 
    // because it's not relevant to this example. Normally this would be hidden behind a
    // helper function. (See the "Signature Shielding" pattern.)
    const request = HttpRequest.createNull({ body: JSON.stringify({ email: "my_email" }) });
    await controller.postAsync(request);

    // Now we assert that the controller did what it was supposed to.

    // First, we'll assert that we tried to send two emails.
    assert.deepEqual(sendGridTracker.data, [{
      to: "my_email",
      subject: EMAIL_SUBJECT,
      body: EMAIL_BODY,
    }, {
      to: "my_email",
      subject: EMAIL_SUBJECT,
      body: EMAIL_BODY,
    }]);

    // Then we'll assert that we tried to insert two audit log entries.
    assert.deepEqual(auditTableTracker.data, [{
      recipient: "my_email",
      result: EmailVerificationAuditTable.STATUS.EMAIL_FAILED,
      emailError: "my email error",
    }, {
      recipient: "my_email",
      result: EmailVerificationAuditTable.STATUS.EMAIL_FAILED,
      emailError: "my email error",
    }]);

    // And finally, we'll assert that we logged an alert.
    assert.deepEqual(logTracker.data, [{
      alert: "FATAL",
      code: "L668",
      message: "Email verification failure",
      recipient: "my_email",
      sendGridError: "my email error",
      auditLogError: "my database error",
    }]);
  });
There ya go. Entirely possible, not difficult, and (if I do say so myself), quite a clean and readable test.


Thank you for taking the timing and writing this up! I appreciate it a lot and that's why I come back to hackernews! :)

Now, your test works and I think I have to apologize in that I should have understood your approach better and write my answer accordingly. The relevant part of my previous answer:

> You WILL end up using a form of "mocking", for example passing the functions B and C as arguments to A and then, under test, don't really pass B and C but different functions that allow you to make assertions in test. That is still mocking.

So my point here is that, yes, you are passing functions into the new VerificationEmailController and the ones you pass in are not the same that are being run in production. This is what I call a mock: you replace a dependency that runs in production with one that runs only in the test.

That's not to say that your way of testing doesn't work. It's just that it comes with the same conceptual issues (but also benefits) that mocks come with.

In particular, 1) if we "misconfigure" the function in our actual production code (i.e. pass the wrong arguments) this won't be covered by the test.

Also, 2) we will reimplement certain logic in tests that are necessary to check the actions. Because different actions might still be valid, such as [add5, add5] or [add10] - they come to the same result, but in your assertions you'll need to handle that knowledge, without checking the state, because the state might live in an external system.

And 3) Forcing dependencies to be explicit (i.e. function parameters) is neither good nor bad per se, but sometimes it's nicer to have them encapsulated and in this case both classical mocking and your approach stop working.

Therefore when it comes to me, I see both classical mocks and your approach as conceptually equal and therefore would call your appraoch mocking too. That's what I wanted to say. I hope that gives you some insight - or maybe you disagree with my 3 points above, then I would be curious why.


That `run` function looks to me like setting up test doubles. What makes `stderr` in this code different from a spy?


Not being a spy. :-) It's an array that's populated by an event listener.

CommandLine is the actual production code that writes to stdout and stderr (and reads command-line arguments). CommandLine.createNull() creates an instance of CommandLine that's "turned off" and doesn't actually write to stdout or stderr. CommandLine.trackStderr() returns a reference to an array that is updated whenever something is written to stderr (or not, in the case of a nulled CommandLine).

I'm off to bed, but I'm happy to answer further questions in the morning. For free, even.


Looks like we've reached max depth, but one last response for @ithkuil:

> Another case where having real production code have parts of it that can be turned off is trunk based development leveraging feature flags.

I've used Nullables to implement "dry run" capability in a command-line tool that did git stuff. Super clean—when I got the --dry-run flag, I just called Repo.createNull() rather than Repo.create().


Another case where having real production code have parts of it that can be turned off is trunk based development leveraging feature flags.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: