Error handling in Upspin

anameaname · on Dec 7, 2017

Pike is totally wrong here, and trying to explain his way out of a poorly thought out error type.

> In contrast, a stack trace-like error is worse in both respects. The user does not have the context to understand the stack trace, and an implementer shown a stack trace is denied the information that could be presented if the server-side error was passed to the client.

Absolutely not. The stack is useful to the implementer, the error message is useful to the user. You include both. Want it on both sides? Send it to both sides. Turning a stack trace into a string and propagating it around is not so expensive, and incredibly useful.

Here's a common error I see in Go: "os: file does not exist". What file doesn't exist? Upspin fixes that oversight, but the very next thing to ask is who wants that file? Without the stack trace you'll never know why /tmp/dfgsdfg was needed and who couldn't find it.

> For those cases where stack traces would be helpful, we allow the errors package to be built with the "debug" tag

Ever have those times where a binary is acting wonky, and once in a blue moon raises an unusual error? You can't just rebuild the binary and deploy it to 10,000 machines to debug it. It needs to be on all the time.

It is unbelievable the mental hoops the Go implementers jump through to explain how their anemic error type is actually better.

heavenlyhash · on Dec 7, 2017

I've come to disagree with the notion of stacks being attached to every error.

And I say that having previously authored a library for attaching stacks to every error in golang, and having used it for several months in several projects.

It's very useful for errors to say something about where they're from. It's possible to do that in more ways than simply having a line number -- I think we've all become a little to quick to be reliant on line numbers as the best, or even only way to tell that story; it just ain't so. Line numbers are great when you're in a development cycle on your own personal laptop, and the source checkout and one binary you're running are in lockstep. Line numbers become completely insufficient when you're debugging a large system, particularly if it has more than one version of a particular process. Quick, what's "Line 169" mean when it shows up in your logging bucket during a rolling blue/green deploy of your code to a >100 node cluster?

Stack traces are just a slice of line numbers. They're not magical, and since it's not a given that line numbers provide all context necessary to trace something, it follows that stack traces don't necessarily provide all context necessary to trace something either.

sciurus · on Dec 7, 2017

Stack traces not providing enough context necessary to trace something is an argument for attaching, more information to errors, not less.

Checking entries in my https://sentry.io installation, for example, I see the exception, the stack trace, the server the error occurred on, the release it was running, the URL being accessed, the values of local variables, etc. Having all this is invaluable for understanding what went wrong.

ereyes01 · on Dec 8, 2017

I agree, there's too much uncertainty with line numbers, especially in a changing codebase.

A system I used to work on (in C) had the build system generate unique error codes for every unique error point. The codes were 64 bits and consisted of an eye-catcher prefix for easy identification in a hex dump (think it was 0xEEE7), 24 bits identifying the file, and the rest was the error code. Every build generated a map file for you to look up error codes quickly. To generate one of these errors, you called a macro with ugly magic code that generated the unique part of the code, and returned that, logged it in the trace, or whatever.

The unique codes served us quite well in precisely identifying where they came from. It would be nice if Go and other languages somehow baked that in for us so you would have a user-friendly error message accompanied with the unique code for developers (which users can just easily ignore).

heavenlyhash · on Dec 8, 2017

Yes, similarly, I rather admire the way mysql for example has numeric error codes from a large and no-reuse-allowed table. The human readable string hints what's going on; the numeric code is pretty much guaranteed to find you a relevant hit in $search_engine_of_choice, and will be relevant to your version of the software even if a similar human-readable message is used in different ways in other versions.

It seems to be a pretty large up-front organizational commitment, though, and also requires significant discipline to stick to; possible in mature products, but tricky in young rapid-iteration areas.

aleksi · on Dec 7, 2017

> Here's a common error I see in Go: "os: file does not exist". What file doesn't exist?

That error contains file name: https://play.golang.org/p/Ea8FKF82c9 Package documentation is clear on that: https://golang.org/pkg/os/

oblio · on Dec 7, 2017

Oftentimes the stacktrace is useful even to the user.

"Foo calling Bar, which fails. Let's check if Foo is configured correctly".

Even the abominations that are Spring stacktraces are useful. I think every language that can include them should include them.

natefinch · on Dec 8, 2017

hell no. Rule #1 - users should never see stack traces. Every time some python tool fails and dumps a stack trace, I want to throw my computer across the room. As as user, I never want to see a stack trace. I have zero interest in debugging some random tool. Show me a decent error message with tips on how to get more info. If you want a stack trace that I can give in an error report, put it in a log file somewhere.

xiaq · on Dec 7, 2017

Go's rejection of exceptions comes from the philosophical ground that errors should be expected no matter how unlikely they are, and the programmer must make a conscious effort to think about how to handle errors - either by attempting to recover from it (probably with some logging) or passing it to the caller (after annotating and possibly reclassifying the error). Pike argues that different applications require different strategies, and there is no single solution that fits all. No strategy is biased by the language; the choice must be made consciously. Hence no auto propagation of exceptions.

Exceptions optimize for the use case of passing it to the caller because that is indeed more common, especially when writing libraries. However, it makes recovering unnecessarily verbose. In the worst case, you may need to nest the call to every exception-throwing function in a try block. Exceptions also carry the danger of spoiling the developer: it encourages developers to think less about errors because they are "exceptions". When errors do happen, the exception being thrown often carries little information about what is really wrong.

This problem is mitigated by stack traces, but stack traces do not always have the entire context. If an AccessDeniedException is thrown during a multi-step operation, do we know which user is denied access and which step denies access? A carefully designed library will populate the exception with user information when throwing it, and likely catch and rethrow it to add more information. Does this sound familiar? This is exactly what Pike is advocating in this article. It requires you to think about the execution flow of your program not in terms of function calls, but in terms of more high-level operation steps that the user can also understand to some extent.

At the end of the day, even in a language with exceptions, error handling code can still benefit from the approach Pike advocates. If executed well, it makes the stack trace unnecessary, because the "operation traces" should now contain enough information.

However, this approach requires a lot of discipline, and the programmer must have a clear picture of all logical operations involved. Unfortunately that is not always possible; programs can easily become too large for one person to completely grasp how it operates in a limited amount of time.

To conclude, in the perfect world where programmers always have a clear understanding of all the operation steps involved in a program, Pike's approach is flawless. But in the world we live in where no one ever knows entirely what is going on, automatic stack traces are (still) a sensible fallback. Yet, I take this as a protest against the status quo rather than an oversight: programs should not grow too large for a programmer to grok, and programmers should strive continuously to simplify and understand the programs they maintain.

sagichmal · on Dec 7, 2017

What an excellent comment. Thank you for writing it.

pjmlp · on Dec 7, 2017

You see this pattern in every feature that isn't quite right in Go.

ynezz · on Dec 7, 2017

I don't agree, but if you think so, you can try making go2 better via https://github.com/golang/go/wiki/ExperienceReports

pjmlp · on Dec 7, 2017

I don't believe Go 2.0 will ever happen.

So far that page looks like when our government puts out "work commission" for any ongoing problem people complain about.

Go mailing lists have almost zero meaningful activity regarding Go 2.0.

ynezz · on Dec 7, 2017

Source: https://blog.golang.org/toward-go2

> I don't believe Go 2.0 will ever happen.

"Once all the backwards-compatible work is done, say in Go 1.20, then we can make the backwards-incompatible changes in Go 2.0. If there turn out to be no backwards-incompatible changes, maybe we just declare that Go 1.20 is Go 2.0."

So yeah, Go 2.0 might not even happen.

> So far that page looks like when our government puts out "work commission" for any ongoing problem people complain about.

I would love if my government would take some inspiration from such projects, namely from this:

"We did what we always do when there's a problem without a clear solution: we waited."

pjmlp · on Dec 7, 2017

If Cloudflare hadn't exploded as it did, they would not have done any fix.

So lets see what actually happens to those issues being reported.

andreareina · on Dec 7, 2017

Is there a name for this technique/pattern of using an argument's type to figure out which parameter it is, rather than using its position or a keyword? I've just started doing the same recently and it's quite ergonomic.

Are any (currently existing) type systems able to encode this sort of thing such that it can be statically checked? IIUC from here[1] the compiler allows any type to be passed in. Sum types catch the error of passing the wrong type in, but don't do anything about passing the same type in multiple times. It's unclear whether the Upspin solution of last-write-wins is simply an easy default that falls out of how they process the arguments, or whether they actually use that property somewhere. I've made the equivalent case an error in my code since it seems much more likely to be indicative of a logic error, I'd love to hear arguments for doing it the other way.

[1] https://upspin.googlesource.com/upspin/+/master/errors/error...

junke · on Dec 7, 2017

So if it is a string and contains "/", it is categorized as a path?

https://upspin.googlesource.com/upspin/+/master/errors/error...

User names can contain slashes apparently:

https://godoc.org/upspin.io/user#Parse

andreareina · on Dec 7, 2017

Yeah, that part seems fragile. Which is why I'm against magic do-what-I-mean logic unless there's only one reasonable way to make semantic sense of it. And preferably only one way to do so, reasonable or unreasonable.

codemac · on Dec 7, 2017

It's good to see the Go authors spending more of their time implementing a system using Go.

It should inform any deeper changes with Go 2, hopefully.

breakingcups · on Dec 7, 2017

The error constructor errors.E is without a doubt the worst design for this type of code.

It seems Rob Pike is looking for the simplicity of method overloading here, which Go (which Pike co-designed) explicitly doesn't provide [1]:. "Method dispatch is simplified if it doesn't need to do type matching as well. Experience with other languages told us that having a variety of methods with the same name but different signatures was occasionally useful but that it could also be confusing and fragile in practice. Matching only by name and requiring consistency in the types was a major simplifying decision in Go's type system."

So Pike turned to the catch-all interface{} type and allows you to specify anything and everything as a parameter. A few quick questions. Without reaching for the source of the implementation of errors.E, what happens when: 1. You supply two or more Kinds as arguments? 2. Same as above for all other types(Op, Err, PathName, UserName)? 3. You supply a completely random object, string or integer as argument?

Couple this with the fact that error-paths are often the least-tested code and you have a recipe for disaster.

This is a hack. It might be quick to write but it is certainly not well-designed or elegant. I say that as a proponent of Go.

[1] https://golang.org/doc/faq#overloading

xiaq · on Dec 7, 2017

I wonder why he didn't use the Builder-like approach used in the context package (https://golang.org/pkg/context/). It is more verbose to implement, but since this error package is used throughout upspin it sounds like a net win.

Edit: the With* functions in context is not exactly Builder-like since some of them return multiple values and cannot be chained. However, the error type doesn't suffer this. You can implement With* methods on the Error type and make the following possible:

errors.New("message").WithKind(errors.Permission).WithOp("create")

Which is now completely type-safe, and only slightly more verbose than

errors.E("message", errors.Permission, errors.Op("create"))

andreareina · on Dec 7, 2017

The docs[1] do say that the last of a particular type wins, though as I say in [2] it's not clear to me whether that's actually desirable, or if it's essentially an implementation detail (process the arguments in order and mutate the object) that got blessed because it's easier that way. I'd love to hear arguments for last-write-wins over immutable assignment.

It seems you can't implement custom interfaces on the core data types[3], so a do-nothing interface isn't an option to emulate sum types and enforce type-safety (E switches on string to set the error message). At least the code[4] gives you a run-time error instead of silently ignoring unsupported types.

[1] https://godoc.org/upspin.io/errors#E

[2] https://news.ycombinator.com/item?id=15870334

[3] http://www.jerf.org/iri/post/2917

[4] https://upspin.googlesource.com/upspin/+/master/errors/error...

heavenlyhash · on Dec 7, 2017

This blog is worth a read even if you don't care about Upspin or Go. Some details are certainly specific to Upspin (and they're marked as such), but there are also patterns worth thinking about for programmers of any background.

My favorite highlights:

> it is critical that communications [between Upspin servers] preserve the structure of errors

YES. This is something that's wildly underconsidered in most projects and indeed most programming languages. Errors need to be serializable... and deserializable, losslessly. Having errors which can serialize round-trip is a superpower. It unlocks the ability to have -- well, the blog gets to it later:

> an operational trace, showing the path through the elements of the system, rather than as an execution trace, showing the path through the code. The distinction is vital.

YESSS. From personal anecdote: I've been writing a collection of software recently which is broken up into several different executables. This is necessary and good in the system's design, because they have different failure domains (it's nice to send SIGINT to only one of them), and they also need to do some linux voodoo that's process-level, yada yada. The salient detail is that in this system, there are typically three processes deep in a tree for any user action. That means I need to have errors from the third level down be reported clearly... across process boundaries.

An operational trace is the correct model for this kind of system. Execution traces from any single process are relatively misguided.

> The Kind field classifies the error as one of a set of standard conditions

This is something I've seen emerge from many programmers lately, independently (and from a variety of languages)! There must be something good at the bottom of this idea if it's so emergent.

The "Kind" idea is particularly useful and cross-cutting. They're serializable -- trivially, and non-recursively, because they're a simple primitive type. And, per the guidelines of what would be useful in a polyglot world, they're also virtuous in that they're pretty much an enum, which can be represented in any language, with or without typesystem-level help.

(I also have a library which builds on this concept; more about that later.)

---

Now, some things I disagree with:

Right after describing the importance of serializable errors, Pike goes on to mention:

> we made Upspin's RPCs aware of these error types

This is a colossal mistake. Or rather, it's a perfectly reasonable thing to do when developing a single project; but it severely limits the reusability of the technique, and also limits what we can learn from it.

I'm a polyglot. I'd like to see a community -- with representatives from all languages -- focus on a simple definition of errors which can be reliably parsed and round-tripped to the wire and back in any language. For gophers, a common example should be Go and Javascript: I would like my web front-end to understand my errors completely!

I think this is eminently doable. Imagine a world where we could have the execution flow trace bounce between a whole series of microservices, losslessly! It's worth noting, however, that it would require an investment in keeping things simple; and in particular, controversially, not leaning on the language's type system too much, since it will not translate well if your program would like to interact with programs in other languages. Simplicity would be key here. Maps, arrays, and strings.

---

Lastly, I said earlier I have a library. It's true. (I have two libraries, actually; one in which I tried to use the Go type system more heavily, and included stack traces -- both of which I've come to regard as Mistakes over time and practice; I'll not speak of that one.)

Gophers might be interested in this, but also I'd love commentary from folk of other language backgrounds:

https://godoc.org/github.com/polydawn/go-errcat

This library is an experiment, but using it in several projects has been very pleasing so far. It has many of the same ideas as in this blog, particularly "categories" (which are a clear equivalent of "kinds" -- what's in a name?). Some ideas are taken slightly farther: namely, there is one schema for serializing these errors, and it very clearly coerces things into string types early in the program: the intention of this is to make sure your program logic has exactly as much information available to it as another program will if deserializing this error later; any more information being available in the original process would be a form of moral hazard, tempting you to write logic which would be impossible to replicate in external process.

The main experiment I have in mind with this library is what's the appropriate scope for a kind/category? Does it map cleanly to packages? Whole programs? Something else entirely?

Shoothe · on Dec 7, 2017

Oh, it looks more and more like regular exceptions (nested errors, details, kinds).

dullgiulio · on Dec 7, 2017

But this has never been where Go tries to be different from exceptions.

The only difference is explicit vs implicit (or at most, annotated methods in a class, but that doesn't say where each exception could happen.)

Annotating Go errors has always been suggested. Others have done a very good job for nested errors (Dave Cheney in particular.)

As Go doesn't have inheritance and errors are just values, you could always do what is suggested here. Just make sure to implement the one function Error interface.

As the article says, it takes one afternoon of work and you have error handling tailor made for your project.

perfmode · on Dec 7, 2017

Except for control flow, which is my biggest gripe with exceptions.