Zero overhead deterministic failure: A unified mechanism for C and C++ [pdf]

twic · on Sept 6, 2018

I opened the document and immediately searched for "Sutter" to see how this interacts with Herbceptions. It turns out, nicely. Well done!

That said, the rationale seems a bit bogus to me:

> The C calling convention has the caller allocate the space for the returned value before calling a function. union { T value; E error; } takes the size of whichever the bigger of types T or E is, which is as optimal as it can be. However the additional discriminant in struct { union { T value; E error; }; _Bool failed; }; could take up to eight additional bytes more than that, or worse, depending on packing. This may not seem like much, but it is more than optimal.

> For fails(E) returns, it is proposed for at least AArch64, ARM, x86 and x64, that the discriminant be returned via the CPU's carry flag. This is because compilers can often fold the setting or clearing of the CPU's carry flag into the ordering of other operations, thus making this a zero runtime overhead choice of discriminant.

If the ABI for returning discriminated unions is inefficient, then change the ABI. Don't add a whole new language feature and then only give that an efficient ABI.

As for the errno delaying stuff, how about adding a new function consume_errno(), which (up to "as if") reads errno and then sets it to an undefined value? Compilers would then be free not to set the real errno at all if they could prove that a setting of errno was always followed by a call to consume_errno.

I chuckled at this:

> There are two specific subcategories of C++ types which C can transport safely:

... followed by descriptions of Rust types which are Copy, and Rust types which aren't!

The concept of being "insufficiently zero overhead" is also a good one, a bit like being a little bit pregnant.

tomp · on Sept 6, 2018

> If the ABI for returning discriminated unions is inefficient, then change the ABI. Don't add a whole new language feature and then only give that an efficient ABI.

That's exactly what they did; there's nothing preventing the compiler from reusing that signalling mechanism for generic (binary) discriminated unions.

ksherlock · on Sept 8, 2018

I'm not sure that's true. For ZOE, the compiler knows whether it's returning a failure or not, and thus whether to set/clear the carry flag. The callee immediately handles it so there's no need to store the carry flag. For a generic user defined union, neither of those is true; the compiler won't be able to deduce which field is active in all situations and won't be able to preserve the carry flag (even if it wasn't modified by other instructions, how would you handle 2 discriminated union variables?) so the flag needs to be stored in the union. The compiler could automatically add a flag variable and update it appropriately, but then again, that can be done today in c++ ... call it std::variant.

ezoe · on Sept 6, 2018

Even that is too much ABI change. It would take another 20 years for real world embrace the change.

For the record. It took 20 years for GCC to change std::string implementation to COW-based to SSO-based.

int_19h · on Sept 6, 2018

> If the ABI for returning discriminated unions is inefficient, then change the ABI.

The problem is that the C (and C++) type system doesn't have the notion of a discriminated union. It gives you the ability to overlap fields, and whatever arrangements you make on top of that are your own. But it also means that there's no way for ABI to know which types of the union are valid and which aren't.

So their choices were either to add full-fledged variant records to the language (which would complicate matters a great deal, since full-fledged variant records mean that types no longer have a static size, which is a rather fundamental concept in C++), or to special-case it like they did to solve the problem at hand.

JoshTriplett · on Sept 6, 2018

> So their choices were either to add full-fledged variant records to the language (which would complicate matters a great deal, since full-fledged variant records mean that types no longer have a static size, which is a rather fundamental concept in C++),

Variant records (or safe unions, or enums-with-data, or ADTs, however you want to describe them) can absolutely have a static size: they're the size of the largest variant, plus the discriminant, however you choose to store that.

int_19h · on Sept 6, 2018

At that point, you're throwing away the space optimization opportunity that the paper is talking about in the first place.

BeeOnRope · on Sept 6, 2018

It's a bit weird they are especially concerned about the exact size this type of discriminated union anyways, since the primary use is to be used on the one-at-a-time on the stack, which is one of the cases where space often does not translate directly into runtime cost (unless say if you are packing a lot of them into the array).

Sometimes there is no "space" cost at all (at least in memory) for ABIs which allow the full union to be passed back in registers. Currently the x86 ABI allows you to pass two pointer-sized things back without using the stack at all (in RDX:RAX in x86 code), which I suspect aligns purposely with their requirement that the error struct be not bigger than two machine words.

I think a bit of what is happening here is that the C++ committee can't change the ABI: that is platform specific, not part of the standard and not in their purview. What they can do is introduce a new type of call which would have to be accommodated by the existing ABIs and this presents an opportunity to add this boolean-value-in-flag-register to the ABIs which have to be updated anyways.

cryptonector · on Sept 6, 2018

My impression was that the discriminant would be "as if" it's there, but it wouldn't actually be allocated, or at least its allocation can be avoided if there is a sentinel success value for the error case (0 -> success, not-0 -> failure and the specific value indicates what type of failure).

saurik · on Sept 6, 2018

Note that this seems to mean "zero extra cost over existing solutions in C", not "zero extra cost over C++ exceptions" (though I guess they try to make that argument also): C++ exceptions have no runtime cost at all unless you throw an exception (which is supposed to be rare), but when that happens the cost is unpredictable (as the required tables might never have even been loaded to RAM). This solution is fast and deterministic during an error condition, but certainly has overhead during the success path: they argue in the paper that this overhead isn't as bad as you might expect--as one might expect a pipeline stall caused by a check on "did an error occur?", which doesn't happen due to speculative execution--but the reality is it still uses at least a couple extra instructions for every function call that can fail (which rapidly becomes "all function calls"). To avoid burning an extra register (or making functions return aggregates), what they are proposing is: "for at least AArch64, ARM, x86 and x64, that the discriminant be returned via the CPU's carry flag. This is because compilers can often fold the setting or clearing of the CPU's carry flag into the ordering of other operations, thus making this a zero runtime overhead choice of discriminant." (with a footnote stating "As it is a single bit being branched upon, status register update pipeline stalls should not occur."). It seems like the key part of the argument is that "Adding explicit checks for whether an exception has been thrown to the successful code path is now usually free of cost on such CPUs, as stalls in other parts of the pipeline will block the CPU, thus leaving idle spare CPU execution units." (and so saying "we had spare capacity before we were wasting, so now we are filling it)... I guess I would feel more comfortable with benchmarks (as I could have sworn someone did benchmarks on similar schemes a few years ago, due to Rust, and showed what I considered to be a non-trivial overhead) :/. (Also, don't CPUs normally execute other hyperthreads during those stalls? I would want to know not just "my program isn't being extra limited if on an otherwise idle machine", but have a concept of what total system impact this change would have.)

BeeOnRope · on Sept 6, 2018

Yeah, it's definitely not "zero cost" - I'm one of those people who are often heard saying things like "not-taken branches are almost zero cost", and perhaps I sometimes say the "almost" part under my breath.

First, setting the carry flag definitely has a cost. The function will either succeed or fail, and the compiler will have to set the carry flag appropriately. It will be fairly rare that the compiler can simply organize to have the carry flag set correctly without adding an additional instruction: for example on x86 I expect you'd see a clc instruction before most successful function returns.

I suppose where the error condition is checked within the function right before return, and the check uses the carry flag, and the carry flag happens to have the right polarity for success/failure, the compiler could optimize it out - but that doesn't seem common!

However, a nice advantage of using the carry flag is that "checking" it is simply a jump instruction, no test or cmp needed. On x86 this is less relevant due to macro-fusion often making the test basically free at runtime, but still nice.

A disadvantage would be any cases where the error check doesn't happen immediately after function return: if there is any other logic between the function call and the error condition check the compiler will probably have to stash the flag somewhere since it is clobbered by most instructions (at least on x86). The structured way the error handling is proposed to work probably makes that a non-issue in most cases, however.

user492966 · on Sept 6, 2018

I'm no C++ expert (I'm actually just starting my first job in a few weeks using it!) but is it really true that C++ exceptions have no runtime cost at all? I thought various optimisations aren't possible to implement in the presence of exceptions; for example, a move constructor that isn't marked "noexcept" will not be used during a std::vector resize (instead it defers to the copy constructor), right? Isn't that a runtime cost?

josefx · on Sept 6, 2018

That isn't a consequence of exceptions, it is a consequence of the fact that your move can fail and that std::vector has to maintain a sane state in the case of an error. You could write a bad::vector that just clears its contents in case of an error and it could use a throwing move just fine.

anfilt · on Sept 6, 2018

Hmm, I am not sure what to think. I read through it, but C being able to call C++ functions and handle errors would be nice.

However, it does add things to both languages, and I generally think it's best to avoid adding things at all costs. It just makes things more complicated.

xenadu02 · on Sept 6, 2018

It trades false simplicity (errno) for explicit yet bounded implementation complexity, yielding a much simpler model for the programmer to actually use.

There may be a fatal flaw but at first glance I like it.

int_19h · on Sept 6, 2018

IMO the biggest advantage of this proposal is that it'd make error reporting part of the C ABI, which still is - and is likely to remain - the common base for all kinds of FFI.

earenndil · on Sept 7, 2018

Most languages will probably still do their own error reporting, so it will arguably not do very much, except make extra work for language maintainers to make their own error reporting coexist with the c type.

int_19h · on Sept 7, 2018

The point is that right now, that error reporting usually ends at the FFI boundary. But with this thing, it could be propagated across the boundary, in a standard way that the other side hopefully has a meaningful projection for.

And if you cross the boundary twice (back and forth), and nothing in between handles the error, then it can even be propagated across both boundaries in a way that allows the original error to be preserved or reconstituted. So you could e.g. throw Python exceptions across C++ stack frames and catch them on the other side - and C++ would correctly execute destructors etc.

naasking · on Sept 7, 2018

They already need to do that with errno, and this proposal would make error handling simpler, so it still seems like win.

pjc50 · on Sept 6, 2018

It also allows taking away a feature that requires complexity: POSIX errno, which is a thread-local variable.

bluGill · on Sept 6, 2018

Some complexity is good and some is bad. In this case I would call it good - it makes the language more complex, but as a implementor of programs my code becomes simpler because a slightly more complex language makes my code simpler for an overall win.

anfilt · on Sept 6, 2018

I tend prefer minimalism. I tend to find a lot software abhorrent for complexities and bloat. Heck even standards keep getting more bloated. Also kinda wish the terminals/consoles were treated more of a first class citizen. However, so much stuff expects a GUI. Heck, I am not even that old only 26. Heck, if I could just use text based browser in the console, I probably would very rarely start up my desktop environment.

Speaking of minimalism one of my favorite ISA's are OISCs. CISC vs RISC pff OISC man. Particularly Transport Triggered since it actually lends it's self to compile time parallelism pretty well. It's also the only type to have a processor built. It's kinda like a data flow machine but is not. https://en.wikipedia.org/wiki/One_instruction_set_computer#T...

eps · on Sept 6, 2018

Definitely interesting, but the net result is a C code that lacks C's inherent simplicity. Too much declarative noise basically, and a lack of operational transparency. It basically makes C look (and feel) like it's been... erm... pollutted with C++.