The reasons for undefined behaviour being in the C/C++ language standards are pe...

astrange · on March 2, 2021

It actually is important, but it's mostly important when your program gets deployed across architectures. If you're only writing for one architecture (say x86) then you can write the C to just work well there.

One example is that undefined signed overflow is important when the compiler might need to rearrange a loop index - PPC prefers to count down, x86 doesn't care.

Anyway, I think undefined behavior + ubsan is better than defining all behavior. If something's undefined you know it's a bug every time you see it. If it's defined, how do you know it's wrong?

MaxBarraclough · on March 2, 2021

> If you're only writing for one architecture (say x86) then you can write the C to just work well there.

> One example is that undefined signed overflow is important when the compiler might need to rearrange a loop index - PPC prefers to count down, x86 doesn't care.

Signed overflow is undefined behaviour regardless of the target hardware architecture. The compiler is permitted to assume the absence of signed overflow and to optimise accordingly. Unless the compiler documentation specifically says its ok, you still have an undefined behaviour problem. It might happen to work fine, sure, but if you're serious about writing correct programs you should be aiming to deliver a program which is correct-by-definition rather than correct-by-coincidence.

> I think undefined behavior + ubsan is better than defining all behavior

It isn't. That's why Rust makes such a big deal of its Safe Rust subset. It's also a major selling point of SPARK Ada for safety-critical software. It's tremendously valuable to be able to categorically close the door on a whole family of potentially serious and difficult to detect bugs.

> If something's undefined you know it's a bug every time you see it.

No, you absolutely don't.

I already mentioned that high-profile projects like Chromium and the Linux kernel continue to face security vulnerabilities arising from unintended invocation of undefined behaviour. Section 7 of the paper discusses undefined behaviour but doesn't really explore its full consequences, so instead I suggest reading [0] and [1].

Undefined behaviour means exactly that: if undefined behaviour has been invoked at runtime, the behaviour of the program is not constrained by the C/C++ standard. The program is not required to explode loudly, it can do anything. It isn't required to behave the same way each time. Hopefully it will explode loudly, but it's possible everything will seem to be fine. In the worst case the undefined behaviour leads to a serious safety issue or security vulnerability.

Undefined behaviour is even permitted to 'time travel'. [0][2]

> If it's defined, how do you know it's wrong?

You use exceptions or some other well-defined means of detecting and handling runtime errors. For example, Java's NullPointerException and Ada's Constraint_Error.

[0] https://blog.regehr.org/archives/213

[1] https://blog.llvm.org/2011/05/what-every-c-programmer-should...

[2] https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=63...

astrange · on March 2, 2021

> Signed overflow is undefined behaviour regardless of the target hardware architecture.

That's what I said. If your program is correct (it may-overflow like everything does, but doesn't dynamically overflow), then by undefining overflow, you can tell the compiler that it doesn't happen. That lets it reorder operations in a way it couldn't if every + potentially wrapped around.

> No, you absolutely don't.

You seem to have said "no" and then agreed with me. I was proposing trapping on all undefined behavior in debug mode!

> You use exceptions or some other well-defined means of detecting and handling runtime errors. For example, Java's NullPointerException and Ada's Constraint_Error.

Trapping is plausible, but that's not always how people want to fix undefined behavior. For instance some people want undefined memory reads to return 0, or want overflow to wrap. In that case it's hard to distinguish errors from intentional behavior.

I don't like exceptions very much either because control flow gets more complicated. Trapping like Swift does is fine, though.

MaxBarraclough · on March 2, 2021

> If your program is correct (it may-overflow like everything does, but doesn't dynamically overflow)

I don't follow the distinction here. A correct program should never invoke signed overflow, regardless of input.

> by undefining overflow, you can tell the compiler that it doesn't happen

Right, that's essentially the effect of the standard saying it's undefined behaviour: it should never happen when the code runs.

> That lets it reorder operations in a way it couldn't if every + potentially wrapped around.

Right, or more generally, it enables various compiler optimisations.

> I was proposing trapping on all undefined behavior in debug mode!

Ok, I thought that by If something's undefined you know it's a bug every time you see it you were saying that UB always results in a loud explosion.

Unfortunately it's not easy to build a C compiler that traps whenever UB is encountered at runtime. An example: the compiler can't know the size of an array passed to your library. C uses 'thin pointers', unlike most languages where, whenever you pass an array, the callee can inspect the array's length.

> Trapping is plausible, but that's not always how people want to fix undefined behavior.

Ada's solution, of raising exceptions (roughly like Java), seems sensible. Of course, part of C's appeal is that it's very compact and lacks things like exceptions.

> some people want undefined memory reads to return 0

That doesn't sound reasonable. To implement that could be pretty burdensome.

> or want overflow to wrap.

This is something some compilers support as a non-standard feature. GCC supports it with the -fwrapv flag. I suppose it would be friendlier if there were a standard and portable #pragma to tell the compiler what you want, but I'm not sure it's a big enough problem to make it into the standard.

You can 'fake it' pretty well by converting to a unsigned integer type, doing the arithmetic, and then converting back to the original signed integer type. You could write a function to do this. You could use the preprocessor to defer to a compiler-specific intrinsic where one is available. I think GCC's __builtin_add_overflow would do the job but its definition isn't terribly explicit regarding wrapping behaviour.

I think this code would do the job portably, and I don't think it relies on anything platform specific. (I'm relying on the signed/unsigned conversions using two's-complement, I believe this is guaranteed by the C/C++ language specs. I've also used fixed-length integer types for good measure.) Godbolt tells me GCC can optimise it down to a single LEA instruction on AMD64.

    #include <cstdint>
    using std::int32_t;
    using std::uint32_t;
    
    /*inline*/ int32_t wrapping_add_int32t(int32_t num1, int32_t num2)
    {
        return (int32_t)((uint32_t)num1 + (uint32_t)num2); // Compiles down to LEA instruction
        // Alternatively (also compiles down to an LEA instruction)
        // int32_t ret;
        // __builtin_add_overflow(num1, num2, &ret);
        // return ret;
    }

See also [0].

> I don't like exceptions very much either because control flow gets more complicated. Trapping like Swift does is fine, though.

I agree it introduces action at a distance flow-control. I'm afraid I don't know Swift.

[0] https://stackoverflow.com/q/59307930/

Vaguely related fun: https://github.com/MaxBarraclough/IntegerAbsoluteDifferenceC...

jfoutz · on March 2, 2021

Ah, apologies. my comment is a bit of a troll. Once the compiler encounters undefined behavior, it's free to do whatever it wants. like, produce an executable that does exit(0); when it hits that condition. Compiler writers are generally ethical people that won't do that, but there are plenty of cases of aggressive optimizers eliminating whole code paths - if(undefined) - well, let's always take the true path. sure hope that undefined wasn't some sort of security check. :shrug:

IgorPartola · on March 2, 2021

I would prefer the program always exited on undefined behavior. That would be a hell of a lot more secure than it continuing going potentially blowing the stack or letting the attacker call arbitrary code. It’s the same with memory allocators: if malloc() fails to allocate new memory, I don’t want it to return a NULL. I want it to use a static buffer to log the problem and then exit the process. There is almost never anything you can do when you run out of heap and if you can you aren’t naively using malloc() from your system library and checking for NULL return values.

The whole problem with undefined behavior is that it is faster to not check for undefined behavior and calling exit(1) (exit(0) would be a successful exit). Think about it in slightly higher terms: you implement a linked list that can search for an item and return a pointer to it once it finds it. Your implementation explicitly says that if you search for an item that isn’t in the list you will hit an infinite loop. I disregard the warning and let it search for an item not in the list. I hit an infinite loop. Could you have added a check for “if (current == head)” and bail then returning NULL? Sure you could but that introduces a branch and slows things down. Better label what can happen as undefined behavior because maybe on some future processor you’ll have that check because it’s cheap but on x86 it isn’t so you don’t. This is essentially the same thing.

MaxBarraclough · on March 2, 2021

> if malloc() fails to allocate new memory, I don’t want it to return a NULL. I want it to use a static buffer to log the problem and then exit the process. There is almost never anything you can do when you run out of heap and if you can you aren’t naively using malloc() from your system library and checking for NULL return values.

In C's defence, you can easily get this behaviour by wrapping malloc in a safe_malloc function. Given that C lacks exceptions, it makes good sense to handle unable-to-allocate by returning NULL as this leaves the door open to all possible strategies.

MaxBarraclough · on March 2, 2021

> Once the compiler encounters undefined behavior, it's free to do whatever it wants. like, produce an executable that does exit(0); when it hits that condition. Compiler writers are generally ethical people that won't do that

This is wrong on two points.

Firstly, real world compilers very often do handle various kinds of undefined behaviour with immediate termination. On many platforms, dereferencing a null pointer will result in a segfault. Sometimes compilers generate code to trap if undefined behaviour would result. In the C++ standard, some errors are defined to result in a call to std::terminate, rather than undefined behaviour. [0]

Secondly, as IgorPartola indicates, doing this isn't irresponsible, it's the least bad way to handle undefined behaviour. If your loop has overrun the end of your array, you generally don't want the execution to silently proceed with invalid data, you want execution to end immediately.

[0] https://stackoverflow.com/a/43675980/