Hacker News new | past | comments | ask | show | jobs | submit login
Views on Error Handling (dannas.name)
96 points by dannas on July 18, 2020 | hide | past | favorite | 91 comments



The very concept of "error handling" is absurd.

There are no errors, just unnecessary abstractions and control flow hacks. You try to open a file; either you can or you cannot, and both possibilities are equally likely and must be handled by normal control flow in your program. Forcing an artificial asymmetry in the treatment of both cases (as championed by the error handling people) adds ugly complexity to any language that tries to do so.


The problem of error handling arises due to type systems and function call semantics.

  data = open("/var/foo")
Most programmers are going to expect 'data', the result returned by 'open', to be a type that allows reading/writing the file. The programmer expects /var/foo to exist, or they might have checked before calling 'open', but even that's not foolproof.

Historically, a failure might just set 'data' to an invalid value (like 0 or null) but that ended up being a bad idea. And we needed some way to return more information about the error. So we started doing this:

  error = open(data, "/var/foo")
But this mainly just complicated things. Is 'data' input or output? The function doesn't return its actual output. And it's still possible to ignore 'error', so 'data' is still potentially undefined.

Then exceptions were invented so we could use proper function call styles again, and the program wouldn't go into an undefined state. Instead, the error could be handled with separate logic, or the program would halt if it was ignored. This was far from a perfect solution, though.

Then sum types entered the mainstream, so 'data' had well-defined ways of returning something other than the expected result. But that resulted in a lot of competing conventions and stylistic decisions for what to do when 'data' is an error type, that haven't quite been settled yet.


Again referring to Go's imperfect but interesting handling of this problem, the style in Go would be:

  data, err := open("/var/foo")
  if err != nil....
In fact, the compiler will make you deal with both values after assignment unless you explicitly ignore one of the return values.


This way a combinatorial explosion lies.

Each library/system call you do can result in a set of possible consequences. We usually don't care about them equally, though: in fact, in the file example, 99% of the time we care about whether the file was opened or not - and in the latter case, we don't need to know why. So the asymmetry is already introduced by the intent of the program - there's usually only one path of execution we want; other ones are distractions. Error handling exists to express that asymmetry of caring at tool level.


I completely disagree. A program or function is designed to perform an operation. If that operation requires the contents of the file, then the program cannot continue unless it successfully reads the contents of the file. There is already a natural asymmetry. If you cannot open the file, there isn't any more to do.


An "operation" is not something inherent to the code. If we look at a function that may get what we call an error, we'll see that in either case it completes and returns control to the caller. We label one such result 'success' and another 'failure' because we also have an idea of purpose of the function, but the purpose does not exist at the code level. Maybe this is why we struggle with errors.


Once you give a function a name, it has an operation that's inherent to it.

If you have a function called `Add` that takes 2 parameters and returns the sum of those two numbers. That's the operation. If the web service that you call to perform the sum is down, it cannot return that sum, so that's an failure.

The code that calls this function needs that sum to continue it's operation. If it cannot get that result it cannot produce it's own result and that error needs to be propagated. Maybe the entire program has a purpose that it cannot now be completed and should be aborted.


If you give a name to a mountain, it doesn't affect the mountain, it affects you :)

Joking aside, consider the following case. A function needs to print a series of reports and opens Print Setup. The user changes his mind and presses "Cancel". Now the function cannot continue and needs to stop the planned operations, undo what it has done so far, and return control to the user.

The internal mechanism you're going to use for this will be most likely the same mechanism you use to handle errors. But what you're handling is not an error, at least in the common usage of the word, because nothing erroneous is happening! Everything goes exactly as should and according to the general purpose of the application.

My point is that the current dichotomy of success and failure does not help us to solve the seemingly simple problem of errors.


> The internal mechanism you're going to use for this will be most likely the same mechanism you use to handle errors.

Which is exactly how exceptions work. If you're using RAII or similar pattern in your language there is no separate path for cleaning up errors than from the normal cleanup. That's the point actually. If the user presses cancel or an exception is raised the eventual stack unwinding will undo everything.

If you have a bunch of conditional statements for every possible error, you're actually creating more situations that are unique for errors. You have all these paths to test for. With exceptions there is only the happy path both in the operation and the cleanup.


That's a neat observation. If you're writing a function as "How do I get from A to B?", that's more error-prone than "What are all the possible outcomes of trying to get from A to B?"


I disagree; programming languages and code are built explicitly and exclusively to perform a function. Operations and purpose are more inherent to the code than their mathematical/logical nature.


Yes, everything is built for a purpose. But the raw materials used to make a thing do not change because of the purpose. They follow their own laws, it's us who ascribe the meaning to the resulting combination. Laws are something we cannot change, only use. But purpose is a concept. We do need concepts to create and operate things, but if we shouldn't mistake them for laws. Concepts are something much more pliable :)

It's all rather abstract, I guess, but yesterday I found an example while playing a game, "Opus Magnum". You are to solve puzzles by arranging mechanical arms to move marbles. Initially I mentally referred to actions of the arms as "taking a marble" and "placing a marble" and solved quite a few puzzles this way. In all solutions all my arms did exactly that: they took and placed marbles without doing any purposeless movements.

Yet yesterday the next puzzle looked too tiresome to built. I had an eureka moment once I realized that the arms do not take and place marbles, they merely close and open their grips and if there's a marble, it gets caught, but the arms themselves have no idea whether they're moving a marble or not. And once I allowed myself to let them do "purposeless" movements without marbles, I found a simple and elegant solution :)


> I completely disagree.

Disagreeing is alright, but here you don't really do, do you? I can translate the paragraph you have written into pseudocode:

> If that operation requires the contents of the file, then the program cannot continue unless it successfully reads the contents of the file. (...) If you cannot open the file, there isn't any more to do.

    if (file opening fails):
        stop doing things
    else:
        continue with your operations
This is just a regular "if-else" that can be done with any programming language. The behavior of your program when the file cannot be opened is part of the specification; just as its behavior when it can be opened. I agree with you on that, and I add that the desired behavior can always be implemented using regular control flow constructions. You do not need a specific language construct for "errors", as you have proven by the algorithm that you have described in your text.


> stop doing things

This is what we call raising an exception.

> I add that the desired behavior can always be implemented using regular control flow constructions.

I agree. But that's not a very interesting observation. We added language specific constructs for errors not for the computer but for the human. These constructs make reading and writing code easier and safer.

A code with if-else constructs for every possible error condition is really hard to follow and very brittle. If the result of every error condition is the same (stop doing things) we have developed constructs to make that path easy.

The problem with errors is that understanding and resolving them is often non-local. If a network call fails the code where that fail happens doesn't have enough information to resolve it. If the solution is re-try the entire operation 3 times and then give up, the handling of this error must happen back where the operation starts not some random place in the middle where it actually occurred. Maintaining all the if/else necessary to move that information up through potentially dozens if not hundreds of calls is extremely difficult. And, it turns out, completely unnecessary and easily automated.


Ok, then replace all additions in your program with a function returning either an error or the result. Same with logging statements. And don't ignore the errors.. Is your program still readable?


I agree. You don’t open a file, you try to open a file. When you get a handle back, that’s your library skipping steps.


Can you name a language/ecosystem that gets it right?


Rust seems pretty close to this (with Result<T, E>), though there is also the panic system.


it's trivial in any language to define and use a Result<T, E> type.


No, not really. Languages that don’t really embrace such a type can usually never make its use ergonomic. Adding such a construct to C or Java would be “trivial” but its use would be exceedingly painful.


Here's an example of a result type in Kotlin implemented as a library monad. (the language is getting an official one but is currently less featureful than this lib)

You can use exception or Result depending on when it makes the most sense and this is the best of both worlds. https://github.com/kittinunf/Result


Swift had a bunch for a while too, but it’s getting an official one as well. Again, that’s only because the language provides the right tools for it to exist, in languages where you don’t have this you’re going to have a very hard time trying to add these things in.


That depends a lot on what you mean by "use". Pretty much every commonly used language would be able to define something that can hold one of two things and provide ways to determine which one is present and retrieve one of the two options, but comparatively few are going to be able to provide the same guards against accidental or intentional misuse that pattern matching gets you.


Though I don't think it's perfect, languages like Go that treat errors as simply another type and you check as a return value do get closer to treating errors and success symmetrically versus an exception throwing and typing system like some other languages.


Building on that, the ADT languages with full option types here are fantastic here, because the shape of the data in each case can be taught to the compiler.


Erlang / OTP.


There seems to be no mention of the Common Lisp condition system, which allows for handling of exceptional situations actually done right. Is this omission purposeful?

See https://news.ycombinator.com/item?id=23843525 for a recent long discussion about my upcoming book on the topic. (Disclosure: this is a shameless plug.)


Conditions are certainly technically fascinating. I was introduced to them back when Rust used them for I/O error handling. But Rust ~0.8 dropped conditions, because people found them much more confusing than Result<T, E>-based error handling, and almost no one was actually using any of the power of conditions.

Broadly speaking, conditions can be implemented as a library feature, so you can reintroduce such things in cases where the extra functionality is compelling (though now users won’t be familiar with it, so it’ll be much harder to justify).

Other programming languages have been tending in the direction of implementing generators and async/await, which can be used to more smoothly implement some of the key concepts of conditions. (They’re not the same by any means, but related.)


I've collected references to error handling but - I have to shamefully admit - have never encountered Common Lisp's condition system.

I'll take the time to read up on it properly, but from a quick glance it seems to me to be in the category of non-local transfer of control with a co-routine flavour.

It looks powerful, but I get the sense that a lot of language designers are on purpose trying to restrict the powers of error handling. So returning sym types or error codes are simpler than throwing exceptions which - again looks to me - to be simpler than allowing transfer of control to be decided at run time as in the condition system.

Again, very interesting. And thank you for making me aware of its existence.


> with a co-routine flavour.

Kind-of-but-not-exactly. There are no coroutines whatsoever; the main technical defining point is that the stack is not unwound when the error happens, but it is wound further. Some code is executed that then searches the dynamic environment for matching error handlers, which are executed sequentially; these are then capable of executing arbitrary code provided from earlier in the stack in form of invoking so-called restarts; both handlers and restarts are also capable of performing transfers of control to any point on the stack that was properly annotated as being available for such.


I thought the same thing. If you're surveying error handling approaches, you've got to include Common Lisp's condition system with it's out of band signals and restarts and so on.


There is now a book about them.

https://www.apress.com/gp/book/9781484261330


The HN link discussing that book is literally what I linked above!

Signed, the author of that book. :)


Ah! Sorry I missed it.


> but without the downsides of the costly C++ memory deallocation on stack unwinding.

I.e. I don’t care about restoring the program to a known state when handling an error (memory deallocation is just one case of processing unwind blocks; locks need releasing,my file handles returned to kernel etc). This really only makes sense when your error “handling” is merely printing a user friendly error message and exiting.


(I'm the person he was quoting in the article.)

When I use setjmp/longjmp error handling I almost always want abort semantics but at the library level rather than at the OS process level. [1] Where applicable it's the simplest, most robust model I know. You have a context object that owns all your resources (memory blocks, file handles, etc) which is what lets you do simple and unified clean-up rather than fine-grained scoped clean-up in the manner of RAII or defer. You can see an example in tcc here:

https://github.com/LuaDist/tcc/blob/255ba0e8e34f999ee840407c...

https://github.com/LuaDist/tcc/blob/255ba0e8e34f999ee840407c...

[1] It goes without saying that a well-written library intended for general use is never allowed to kill the process. This presents a conundrum in writing systems-level C libraries. What do you do if something like malloc fails in a deep call stack within the library? Systems-level libraries need to support user-provided allocation functions which often work out of fixed-size buffers so failure isn't a fatal error from the application's point of view. You'd also want to use this kind of thing for non-debug assert failures for your library's internal invariants.

This style of setjmp/longjmp error handling works well for such cases since you can basically write the equivalent of xmalloc but scoped to the library boundary; you don't have to add hand-written error propagation to all your library functions just because a downstream function might have such a failure. I'm not doing this as a work-around for a lack of finally blocks, RAII or defer statements. It's fundamentally about solving the problem at a different granularity by erecting a process-like boundary around a library.


See my response to a parallel comment from dannas.

I can see some minor corner cases where it could be worthwhile but the mental overhead isn't worth it.

I've written plenty of realtime code but spending a lot of time on the code running in the interrupt handlers is mentally exhausting and error prone; I do that when I have no choice. Likewise I've written a lot of assembly code but it's been decades since I wrote a whole program that way -- I don't have enough fingers to keep track of all the labels and call paths.

E.g. just because c++ has pointers doesn't mean I use them very often. >90% of the cases can be references instead.


More context to that quote: >Per Vognsen discusses how to do course-grained error handling in C using setjmp/longjmp. The use case there were for arena allocations and deeply nested recursive parsers. It’s very similar to how C++ does exception handling, but without the downsides of the costly C++ memory deallocation on stack unwinding.

I have never used setjmp/longjmp myself. And I agree with you that my first instinct would be to use it in the similar manner as in many GUI programs: they have have a catch statement in the message loop that shows a dialog box of the thrown exception. You just jump to a point where you print a user friendly error message and exit.

But I still can imagine use cases where you've isolated all other side effects (locks, shared memory, open file handles) and are just dealing with a buffer that you parse. Has anyone used setjmp/longjmp for that around here?

Given your many years in the field and Cygnus background I guess you've used it a few times? Do you happen to have any horror stories related to it? :-)


I hate setjmp/longjmp and have never needed it in production code.

Think about how it works: it copies the CPU state (basically the registers: program counter, stack pointer, etc). When you longjmp back the CPU is set back to the call state, but any side effects in memory etc are unchanged. You go back in time yet the consequences of prior execution are still lying around and need to be cleaned up. It's as if you woke up, drove to work, then longjmped yourself back home -- but your car was still at work, your laptop open etc.

Sure, if you're super careful you can make sure you handle the side effects of what happened while the code was running, but if you forget one you have a problem. Why not use the language features designed to take care of those problems for you?

This sort of works in a pool-based memory allocator.

The failures happen three ways: one is you forget something and so you have a leak. The second is that you haven't registered usage properly so have a dangling pointer. Third is by going back in time you lose access to and the value of prior and/or partial computation.

If you use this for a library, and between the setjmp and longjmp is entirely in a single invocation you can sometimes get away with it. But in a thing like a memory allocator where the user makes successive calls, unless you force the user to do extra work you can't be sure what dependencies on the memory might exist. If your library uses callbacks you can be in a world of hurt.

Trying to keep track of all those fiddly details is hard. C++ does it automatically, at the risk of potentially being more careful (e.g. deallocating two blocks individually rather than in one swoop -- oh, but that language has an allocator mechanism precisely to avoid this problem). The point is the programmer doesn't have to remember anything to make it work.


> Composing Errors Codes ... Instead of sprinkling if statements, the error handling can be integrated into the type ... The check for errors is only done once.

That is only a superficial level of composition, if one can call it that at all, that doesn't account for actual composition of errors of different types. The example provided is just encapsulation and therefore orthogonal to the issue of error handling approaches. i.e. in the example, the error handling code is only centralized, not composed.


Can you make the distinction between "centralization" vs "composition" of errors?

Do you mean the fact that there must be some if-statement within the API that reacts to the different errors and sets a flag used by the Err() method?

Is you opinion that "composition of errors" always requires special syntactic elements such as the match statement?

The code from the blog section:

  scanner := bufio.NewScanner(input)
  for scanner.Scan() {
      token := scanner.Text()
      // process token
  }
  if err := scanner.Err(); err != nil {
      // process the error
  }


The only downside of error codes via Sum types (Rust) seems to be, according to the article, is performance. It then claims that Checked Exceptions are the solution (at least according to Joe Duffy).

Maybe I'm naive to how exceptions are actually implemented, but it seems to me that both a checked exception and Sum Type would incur the same overhead, a single branch to make sure things haven't exploded.


If you want to treat your error result as a first class value, and transport it around, then your sum type can't use the same implementation as exceptions, which can use data-driven stack unwinding to have zero cost in the success case, the data being generated by the compiler and consumed by a stack unwinder after it has been invoked by an exception raise.


As exceptions are an abstraction you can implement them in many ways; one of those is "the same secondary return code error check as you would do manually", but if you assume "errors are extremely rare" (which I assert is fair: people who disagree generally point to a tiny class of things that I would argue aren't errors in the first place, such as "key not found in map" and "file not found on disk") you can use implementations that have literally "zero cost" for success but instead compile all of the exception unwind logic (catch statements and deconstructors) as continuations of the original functions (causing some modest binary code bloat, though the compiler can sometimes avoid this being noticeable) and then do table lookups (which are slow, but not necessarily ridiculous) to find the right on-error unwind target given an on-success function return pointer.

Essentially, I would argue that error signaling is important enough and common enough that it deserves attention by the compiler in the same sense that many of the other things we provide syntax for (such as traits or inherentence) are things which developers can type naive manual implementations of with basic tools (such as switch statements or dictionaries or dragging around lots of function pointers), but if you can abstract it in a way such that the semantics are available to the compiler you can come up with much better ways to handle the problem (such as vtables or polymorphic dispatch caches) for a given set of tradeoffs (such as low memory usage, low construction cost, consistent latency, etc.). If everyone is implementing the feature themselves manually in the code then you have lost any real ability to make great optimizations.

(Note that you don't necessarily have to have it be syntax to do this: you can also have a language such as Haskell--where notably these Either-style errors are usually cited as being from--where they do it in the language but abstracted everything an additional level higher, letting you define a lot of these flow control concepts in terms of a monad, so then downstream users use "do" notation to feel like custom syntax and the monad's bind operator provides a central chokepoint on what was otherwise a bunch of boilerplate. You sometimes--not always--can then do optimizations across the entire program of that shared abstraction. The way languages like Rust and Go are handling this, without support for monads, simply precludes anything other than attempts at reverse engineering semantics from the code, which is ridiculous.)


The obvious solution (in C++) is not to use exceptions at all, but make your own `error` and `expected<T>` class, and just add [[nodiscard]] to them. All the benefits of Go-style errors, you’ll never forget to check the error, and there is very little runtime overhead. If you pass the error as an out parameter then there is zero runtime overhead on success.


Speaking of C++ exceptions: Andrei Alexandruesco has investigated the performance impact of replacing exceptions with error codes. Dave Cheney made a summary of Andreis points in https://dave.cheney.net/2012/12/11/andrei-alexandrescu-on-ex...

* The exceptional path is slow (00:10:23). Facebook was using exceptions to signal parsing errors, which turned out to be too slow when dealing with loosely formatted input. Facebook found that using exceptions in this way increased the cost of parsing a file by 50x (00:10:42). No real surprise here, this is also a common pattern in the Java world and clearly the wrong way to do it. Exceptions are for the exceptional. * Exceptions require immediate and exclusive attention (00:11:28). To me, this is a killer argument for errors over exceptions. With exceptions, you can be in your normal control flow, or the exceptional control flow, not both. You have to deal with the exception at the point it occurs, even if that exception is truly exceptional. You cannot easily stash the first exception and do some cleanup if that may itself throw an exception.


> You cannot easily stash the first exception and do some cleanup if that may itself throw an exception.

You can stash/rethrow exceptions since c++11 with an exception pointer if you really need to.

https://en.cppreference.com/w/cpp/error/exception_ptr


There is still runtime overhead in that you have to check whether you succeeded. The best possible scenario is if the error source knows exactly what code to jump to in the error case, and the calling code can assume that no error occurred if it is running. So in that sense it can be done better. But I'm not sure how material this difference is in light of correct branch prediction in the success path.


> Swift does not AFAICT provide mechanisms for enforcing checks of return types

Swift does this by default! You have to annotate (via @discardableResult) those functions which should not warn.

But of course try/catch is used in Swift more often.


Oh, that was sloppy of me. I should have read up more on Swift (I've never used it myself).

While I have your attention: A big thank you for Fish shell!

And related to the current subject: How does fish handle errors? A quick skim found some constants that are returned upon failure, such as this case for disown: https://github.com/fish-shell/fish-shell/blob/master/src/bui...

What trade-offs did you face when designing error handling for your shell?


Thank you for the great article! You ask a good question.

Shells are rarely CPU bound, so some perf overhead is acceptable. But shells may be used to recover badly broken systems. If fork or pipe fails, most programs are OK to abort, but a shell may be the user's last hope, so has to keep going.

For example, if pipe() fails, it's probably due to fd exhaustion. If your system is in that state, the best thing to do is immediately unwind whatever is executing, and put the user back at the prompt. fish uses ad-hoc error codes (reflecting its C legacy) instead of exceptions, though it uses RAII for cleanup. Your question made me realize that fish needs a better abstraction here; at least use `nodiscard`.

The story is different for script errors [1]. If the user forgets to (say) close a quote in a config file, fish will print the line, a caret, and a backtrace to the executing script. A lot of effort has gone into providing good error messages with many special cases detected. The parser also knows how to recover and keep going; I think Fabien would approve.

1: https://github.com/fish-shell/fish-shell/blob/225470493b3cd1...


Yes, providing good error message is a hard problem. Writing a parser that just bails out when it encounters and error is easy. Writing one that provides error messages that are useful to the user is a _lot_ more work. That's a dimension I didn't touch on in the article.

I would be interesting to do a follow-up to this post where I compare the error handling for some common libraries/programs and ask the authors on what trade-offs the faced when designing error handling. It's a subject, I beliveve, that is often overlooked.


> But of course try/catch is used in Swift more often.

FWIW I find actual exception usage rare aside from automatic error out parameter to exception conversion by the Clang importer when bridging to Objective-C code.


While we're on the topic - C++ doesn't do that by default, but since C++17 you can enable enforcing it on a case-by-case basis - you can mark functions, or even enums and structures, as [[nodiscard]], and then the compiler will issue a warning if you don't use the return value of that function (or whatever function that returns a class or enum marked as [[nodiscard]]).


There are 3 separate things that each require their different approach.

- errors i.e bugs made by programmer - logical "error" conditions that the program is expected to handle for example network connection failed or user input failed - unexpected error conditions that typically boild down to resource allocation errors, socket could not be allocated or memory allocation failed etc.

In my experience all of these are best handled with a different tool.

For bugs use an assert that dumps the core and leaves a clear stack trace. For conditions that the program needs to handle use error codes. And finally for the truly unexpected case use exceptions .


Why dump core when you can log the bug and continue? Sure, in development we want things to fail fast and loud, but when deployed with a customer, I don't want my whole program to crash because there is one obscure code path that has a problem.

And even for conditions that the program is expected to handle, 99.9% of the time all it can do is notify the user and ask for guidance, which means that the error must be bubbled up from a networking or storage layer all the way to the presentation layer - a perfect task for exceptions or something like an error monad.

The only problem with exceptions or error monads is that they get tricky in the presence of resources that need to be released, and even that is well handled with patterns like RAII.


> Why dump core when you can log the bug and continue?

I see from your replies what you're trying to say. If an error occurs, most likely you want the entire operation to abort -- that doesn't necessarily mean the whole program depending on the program.

For example, if I have a GUI app and the "save" operation fails and I typically roll that back right to the event loop of the application and the user gets an error and they can retry the save.

For other types of applications, killing the whole process is ending the operation.


Yes, exactly!

And on the other end of spectrum, there are even systems where it makes sense to go further than killing a single process, and kill the whole container or even VM where a buggy condition was encountered.


> I don't want my whole program to crash because there is one obscure code path that has a problem.

If that one obscure code path corrupted my state, I want to limit the incorrect actions that the software takes based on that state.

This "want" of mine is to be balanced with all the other things I want out of the program, and the relative weights will vary by context... but it is often the case that continuing erroneously risks more harm than simply falling over.


That is indeed a very common need in a non-memory-safe language.

Still, not all bugs have unbounded impact. As long as memory is not corrupted, thing like off by one errors and null pointer exceptions can often be safely recovered from by simply reverting the operation that hit the error until some kind of safe point (such as the last user interaction, or the last thread start).

Edit: spelling.


"dime kind of safe point"?

That's the whole idea of exceptions. Give up on some chunk of what you were doing, and recover to a previous state.

This works best when you have some notion of a transaction, and can get back to the state before the transaction started. This is what "ROLLBACK" in SQL does.


Oops, getting late and typing on mobile... Corrected now, should have been 'some kind of safe point'.

Yes, that is exactly my point in favor of exceptions, or at least against stopping the whole program with a core dump.


So you're suggesting basically writing by logic to deal with bugs i.e. For cases when the program has failed its own logical constraints? For me a bug is like a division by zero. The program violates its own logic and the only logical conclusion is the termination. I find the fixing is often much simpler and faster to have a loud but bang than some obscure unwanted incorrect result.


It often depends on what software you are writing. Writing a web service? Sure, you can kill the process. You only affect that 1 request and the user can retry and other users are unaffected. Writing a server that maintains state for multiple realtime clients? Or a user facing program? If an error can be safely constrained, it can be preferable to log the error but keep the application running.


Well, let me ask you this: why not stop the computer when a program encounters a bug? Why not the whole cluster? Theoretically, an attempt to write to a null pointer could happen because you have corrupted a database or file and the entire system is now in an unreliable state.

The answer is that just as a process has some degree of isolation from other processes on the same system, and from the kernel, similarly components of a process can be well enough isolated that only the specific component that encountered the bug needs to be stopped. This is never 100% safe, but how safe it is depends greatly on the technology and architecture.


Because correctness is important to me. I don't want my programs to silently go about in a buggy state producing incorrect results in a corrupted state.


Not all bugs put the whole program in a corrupted state. Especially if your prigram is written to be mostly stateless. A common pattern is that a state change is tried, it fails because of a bug or some other error, and it is reverted, and an error shown to a user. I would call this a robust program. Of course, not any error can be recovered this way, but it is very often possible (assuming we are taking about memory-safe languages; otherwise, the balance of probabilities is entirely the other way around).


In practice you always have mutated state in a complex system and not everything moves in transactional steps.

People seem scared to dump core but I find that doing it makes my programs much robust and also simpler. I have no muddled logic to "deal with" bugs in the program itself. They simply will abort the program.


I replied the same in another thread, but why not dump the core of all the processes on the system, instead of just the one that encountered the error? And why stop on this system, and not dump core on other systems that were communicating with this one over a network?

The process is one boundary of isolation, and you are making a bet that the corruption has not crossed this boundry. You can take the same bet with subcomponents of the process, just as in larger systems style may actually prefer to reboot the whole machine or even kill it and spin up another.

This all depends on the architecture and technology you are working with. If a user has input some bad data that I didn't think to validate (user inputs 11 in a page number field, when there are 5 pages in total), an operation is initiated on that data (user presses the 'go' button), and that operation is known to be stateless, when it encounters an error (ArrayIndexOutOfBounds), we can safely abort the operation, log the stack trace, and signal an 'internal server error' to the user without having to kill the whole process.

Not to mention, in a program with many non-transactional state changes, dumping core could be a source of persistent corruption in itself, if another thread was doing something as simple as, for example, writing a JSON file.


Yes, if user entered some bad data, crash dump is not a good option - but OP did separated that into different case, and crash dumps are only suggested for "programmer errors", i.e. case that were not expected during development. And I agree that it's better to crash for such cases, as I cannot write good error handler for cases that I did not envision dduring development anyway.


In my example, we have forgotten to validate user data. We are proceeding with the invalid data as if its trusted, and we get an ArrayIndexOutOfBounds - that is a bug, not user error; we should have told the customer that 11 is not valid, instead we will have to tell them that an unspecified error occurred. If this were C++, it may well lead to memory corruption, so dumping core would almost certainly be the safest thing to do. But if it's Java, we can safely continue with program execution, in this case.

In other cases, even in Java, this may lead to corruption. For example, if we wrote that 11 into some kind of configuration file, we may have just corrupted the system persistently, even if we handle the error. So I'm not claiming that his is always a safe thing to do.

However, dumping core is also not safe a lot of the time. In fact, in multi-threaded processes, it can be argued that it is most likely unsafe to crash [as long as there is any kind of persistent state, such as a file] for the same reason interrupting a thread is inherently unsafe - there is absolutely no guarantee possible for what will happen if a thread is interrupted at an arbitrary point.


Do not quite agree with "safely continue" - at the moment you got ArrayIndexOutOfBounds that unvalidated value might be already stored somewhere / used for changing the state, and it may be not possible to revert those changes easily (or at all).

Danger of core dump in MT case is a valid point, though. However, one can argue that "unsafe to crash" is an indication of poor architecture decision - e.g. saving persistent state to file should be done by writing to new file first, and only replacing old file after that is completed (but I kind of agree, this may be hard to make 100% reliable)

P.S. on the other hand, looking on your other comments in this thread, I do agree with your point - yes, there are cases indeed when you have an assurance the error could not corrupt the state / propagate to other part of the program, then logging error and not crashing is a better option. That's rarely happens in my projects, but this is specific of what I'm doing, and does not invalidate the point.


For the third case it’s better to just abort. Tell the user to get more RAM or something. What are you supposed to do when you’re out of memory? Catch the exception? Then what?

Related, I always find it funny when C programmers write `if (malloced == NULL) return NULL;` Either you’re going to forget that this can happen and dereference null (in which case it’s just better to abort the program immediately) or the caller will check this and then close the program. If it doesn’t, the next malloc will be null anyways, and the problem repeats. Just call abort().


Well memory failure checking is usually put in the "can't do shit" category which isn't necessarily true. Both in C and in C++ bad_alloc or null from mallon indicate that the memory manager could not find the memory. This may or may not mean that your application has overcommitted memory in the OS level. Completely depends on the actual memory manager. So therefore the failure to me is just a general resource allocation failure. Would you dump core of your program failed to allocate a socket ? Or mutex?



Rust shines when doing error handling. No way to ignore errors, but properly handling them often adds just a single question mark to your code. Everything stays readable and lightweight.

Of course the error handling story is still perfectible but so far it's already one of the best I know.


The trouble I’ve had as a beginner is crafting error types for those “Union of multiple existing error types”. E.g., myfunc() can return an IO error or a Parse error. The boilerplate for creating a new error type (absent macros) is significant, and while I’m aware that there are macro crates available which automate this, it’s not clear to me which of these if any are “most standard” or otherwise how to distinguish between them.


There are many ways to do it, like you said. Over time, the most popular options have shifted as new support from the standard library arrived. How you handle the errors can boil down to whether or not you really care about what kind of error it is, or just if an error occurred.

Two popular crates for handing these situations are thiserror [1] and anyhow [2], for handling errors by type and handling all errors, respectively.

There are additional ways, like just returning a Box wrapper around the stdlib error type [3], or just unwrapping everything. It depends on what your program needs.

[1] https://crates.io/crates/thiserror

[2] https://crates.io/crates/anyhow

[3] https://play.rust-lang.org/?version=stable&mode=debug&editio...


As a beginner to Rust, this blog post has been excellent, and it has really helped me understand the idiomatic way to handle errors.

https://blog.burntsushi.net/rust-error-handling/


You can "ignore" error is Rust using _ like in Go.


Not really. In Go you can `val, _ := func()` and use the value even if there is an error. AFAIK there is no equivalent in Rust (for Option) outside of unsafe shenaniganry. You can choose to panic / return err / etc, but you can't choose to use the value regardless of the presence of an error.


Yep. I'm pretty sure that even with unsafe shenanigans, you can't access the value without being very explicit about it. You'd need something like:

    let value = unsafe {
        match result {
            Ok(value) => value,
            _ => hint::unreachable_unchecked()
        }
    };
At this point, the fact that you've skipped an error check should be abundantly clear to anyone reading your code.


You can return tuples from Rust fns just like you would in Go, if that's your thing - no unsafe necessary:

  fn foo() -> (usize, Result<(),()>) { (0, Ok(())) }
  let (a, err) = foo(); err?; // propigate error
  let (b, _) = foo(); // discard error
Or more typically, you might use one of the many fn s such as unwrap_or, unwrap_or_else, unwrap_or_default, etc. - to provide your own appropriate default value. I usually find that useful default values are often caller specific anyways (and doesn't require remembering which fns return which default values on error):

  fn foo() -> Result<usize> { Ok(1) }
  fn bar() -> Option<usize> { None }
  let val = foo().unwrap_or(3); // val == 1
  let val = bar().unwrap_or(4); // val == 4
Alternatively you can use out parameters, which occasionally crops up in Rust's own stdlib:

  let mut line = String::new();
  let _ = buf_read.read_line(&mut line);
  // ...use line, even if there was an error...
Also, the "error" type might contain values itself, although you're certainly not ignoring the error if you use it:

  // https://doc.rust-lang.org/std/ffi/struct.OsString.html#method.into_string
  match os_string.into_string() {
      Ok(string) => println!("Valid UTF8 string: {}", string),
      Err(os_string) => println!("Invalid UTF8: {:?}", os_string),
  }


Sure, it's always possible to design a system that avoids the type system of a language. At worst case you can just resort to creating an un-typed lambda calculus and re-implementing your logic there.

Community habits and language frictions matter a lot. In Go, doing the equivalent of `Result` requires custom private types for every combination, with private fields and accessor methods that protect against misuse. And you still only gain runtime safety. And they can still make a zero-valued var and use it without "initialization" (unless you use private types, which are an even bigger pain for a variety of reasons). Any other approach makes it trivial to bypass - public fields can just be accessed, multiple returns can be ignored, etc. In Rust, the community and language make `Result` common, and then you gain compile-time safety in most cases, and warnings in all others (AFAIK, as Result is annotated as "must use", so you get a warning or a `_ =` as a visible marker that you're ignoring something).

---

tl;dr multiple returns of course bypass this, but in practice you won't see that in Rust unless it's intended to allow ignoring the error. Underscores on out-param funcs are a good point, but they're also fairly rare / serve as a visual warning of shenaniganry.


If you're not going to use the success value, you can ignore errors in Rust easily:

    let _ = something_returning_Result();
This does not even give a warning.


> This does not even give a warning.

Just to clarify: the "let _ = ..." construct is the explicit way of suppressing the warning in Rust. You acknowledge that there is indeed a return value but you choose to ignore it. Just calling the function without explicitly discarding the Result will give you a warning.


That's still very explicit. If you don't bind the returned Result to something (`let _ = ...`), the compiler bitches at you.


Go lets you ignore errors by just not binding them at all.


No discussion is complete without mention of Erlang’s view on this

https://erlang.org/download/armstrong_thesis_2003.pdf


Erlang's error handling is mentioned in the article. Maybe read it before posting?


If it helps, the second section of the article is called

> The Erlang Approach - Let it Crash




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: