Outcomes, Mistakes, and the Nuance of Error Handling

zvrba · on Aug 24, 2022

So empirical evidence on sample size of N=2 (Rust and C++) shows that when you take out exceptions from a PL [1] you get a number of mutually incompatible libraries for communicating errors. (Exceptions are controversial in C++ and many projects forbid their use.)

And then there's Go which seems to be too primitive to even facilitate custom error-handling libraries.

It makes me sad that there's no research (that I'm aware of) about ergonomics of exceptions. They're easily thrown, but as they're dynamically scoped, it's hard to know what to catch when you actually want to retry an operation instead of just catching a top-level exception type and logging it.

twhitmore · on Aug 24, 2022

Failing and aborting the request/ user action is the general & correct response to an error.

Retrying is relevant for a specific small segment of callsites dealing with unreliable network or hardware interactions. This fact and knowledge of it should inherently be obvious at the callsite.

I do recognize a valuable distinction between predictable and unexpected errors. In business applications, I refer to these as "business" and "technical" exceptions. Business exceptions are caused by a violation of business rules in the called layer directly, and are potentially predictable & recoverable. Technical exceptions arise from lower layers, are beyond the bounds of encapsulation, and are not recoverable. It seems the Mistake vs Error concept maps to this. I'm not much convinced about nuance beyond this.

To be honest I think both Rust and Go have fallen into a mistake of forcing every operation -- the vast majority being unrecoverable errors -- to be checked. They should just be thrown as exceptions.

Lastly many of the author's examples sound like bad error handling to me. They lack context as to the client request. If something the client could directly understand & fix is broken, it should report a business exception aka "mistake". Otherwise it should report an error without expectation of recovery.

A link on the exception paradigm, using Java as a reference: http://literatejava.com/exceptions/checked-exceptions-javas-...

zvrba · on Aug 24, 2022

> Retrying is relevant for a specific small segment of callsites dealing with unreliable network or hardware interactions. This fact and knowledge of it should inherently be obvious at the callsite.

Yes. However, libraries, at least in .NET world, are notoriously bad at documenting what exceptions are thrown in what circumstances. Example: trying to execute an SQL statement against a server and the network connection drops. Do I get SqlException, SocketException, or SqlException containing an inner exception that is (hopefully) SocketException or TimeoutException. Link to docs: https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlc... It mentions network outage, but specifically only for streaming operations, while most queries are decisively not in that category.

So what exception should the program catch to handle the case of network outage so that it can sleep and retry? It's also extremely difficult to test an outage of an intermediate router/switch because unplugging the network cable tells the OS that the media is gone, which immediately signals an error to programs depending on that interface. (Though possible to mock with FW rules...)

amluto · on Aug 24, 2022

I’m unconvinced. Just superficially, when trying to lock a mutex, WouldBlock isn’t a “mistake” at all. It’s simply the outcome of the operation.

More generally, this post suggests that nuance is important, but nuance can be a lot more complex than this success/mistake/failure tristate. For example, consider an operation to INSERT into a database table with a unique constraint. A constraint violation might be a severe error indicating corruption in one context, but it might be a perfectly valid result indicating that an alternative path is needed in another context. An attempt to open a file that does not exist could be a serious error or just fine, but it’s not usually a situation where retrying makes sense.

There are certainly outcomes that are unambiguously hard errors: if the library itself is corrupted, it’s functions can fail hard. Or if the caller violated a precondition, a hard error is warranted. But interacting with anything external is fundamentally messy, and expressing the outcomes is hard.

bacon_waffle · on Aug 24, 2022

My initial reaction was similar, and I totally agree that there's a lot of essential complexity in this area. At the same time, there might be room for an ergonomic improvement too. It seems like the `?` operator is maybe a little bit too nice - I find that, in practice, the step up in effort between:

  let x = some_function()?;

versus

  let x = match some_function() {
    Ok(value) => value,
    Err(Error::ThatOneRetryableError) => {
        // Retry
    }
    err => return err
  };

...sometimes is just a little bit too much to bother. Is there a pattern/macro/??? that makes this sort of thing a little nicer?

tempodox · on Aug 24, 2022

There are no one-size-fits-all abstractions for these things. Where to handle an error and deciding what to do under which circumstances is uncomfortable detail work. But it needs to be done if you want reliable software.

bacon_waffle · on Sept 1, 2022

If it's fair to say that the abstraction we're talking about is a semantic thing, rather than a syntactical thing, then I think we're in total agreement.

Rust (as far as I know - I'm totally not an expert, nor a programming language person) makes handling the black-and-white cases really nice, but it would be nice if there were less friction for handling the in-between cases. I'm not sold on the solution requiring a 3-enum `Outcome` vs a 2-enum `Result`, but also feel that things could be better.

EdwardDiego · on Aug 24, 2022

It's not namesquatting if they published a crate and forgot about it.

Just learn to use namespaces, Rust, ffs, it's not that hard.