Hacker News new | past | comments | ask | show | jobs | submit login

This would mean you’d have to insert a check every time you add two signed integers together, because signed overflow is UB. You’d also have to wrap every memory access with bounds checks, because OOB memory access is UB.

There are also tons and tons of loop optimizations compilers do for side-effect free loops which would have to be removed completely. This is because infinite loops without side effects are UB. So if you wanted these optimizations you’d have to prove to the compiler — at compile time — that your loop is guaranteed to terminate since it is not allowed to assume that it will. Without these loop optimizations, numerical C code (such as numpy) would be back in the stone ages of performance.

Edit: I just wanted to point out that one of the new features in C23 is a standard library header called <stdckdint.h> that includes functions for checked integer arithmetic. This allows you to safely write code for adding, subtracting, and multiplying two unknown signed integers and getting an error code which indicates success or failure. This will be the standard preferred way of doing overflow-safe math.




Another option would be to define behaviors for integer overflow and out of bounds memory access. Presumably they happen fairly often and it might be a good idea to nail down what should happen in those cases.


Those things aren’t up to the language, they’re up to hardware. C is a portable language that runs on many different platforms. Some platforms might have protected memory and trap on out of bounds memory access. Other platforms have a single, flat address space where out of bounds memory access is not an error, it just reads whatever is there since your program has full access to all memory.

The same goes for integer overflow. Some platforms use 1’s complement signed integers, some platforms use 2’s complement. Signed overflow would simply give different answers on these platforms. The standards committee long ago decided that there’s no sensible answer to give which covers all cases, so they declared it undefined behaviour which allows compilers to assume it’ll never happen in practice and make lots of optimizations.

Forcing signed overflow to have a defined behaviour means forcing every single signed arithmetic operation through this path, removing the ability for compilers to combine, reorder, or elide operations. This makes a lot of optimizations impossible.


The problem is that here is a vicious circle.

Most old computer architectures had a much more complete set of hardware exceptions, including cases like integer overflow or out-of-bounds access.

In modern superscalar pipelined CPUs, implementing all the desirable hardware exceptions without reducing the performance remains possible (through speculative execution), but it is more expensive than in simple CPUs.

Because of that, the hardware designers have taken advantage of the popularity gained by languages like C and C++ and almost all modern programming languages, which no longer specify the behavior for various errors, and they omit the required hardware means, to reduce the CPU cost, justifying their decision by the existing programming language standards.

The correct way to solve this would have been to include in all programming language standards well-defined and uniform behaviors for all erroneous conditions, which would have forced the CPU designers to provide efficient means to detect such conditions, like they are forced to implement the IEEE standard for floating-point arithmetic, despite their desire to provide unreliable arithmetic, which is cheaper and which could win benchmarks by cheating.


CPU designers don't like having their hand forced like that. If you create a new standard forcing them to add extra hardware to their designs, they'll skip your standard and target the older one (which has way more software marketshare anyway). They will absolutely bend over backwards to save a few cycles here and a few transistors there, just so they can cram in an extra feature or claim a better score on some microbenchmark. They absolutely do not care at all about making life easier for low-level programmers, hardware testers, or compiler writers.


I don't believe adding simple checks against data already present in L1 caches and marked as "unlikely to fail" should be so onerous.


> In modern superscalar pipelined CPUs, implementing all the desirable hardware exceptions without reducing the performance remains possible (through speculative execution), but it is more expensive than in simple CPUs.

Yeah, and that's how you get security vulnerabilities!


doesn't C force 2s complement now? If so, one less thing to worry about.


UB is a better option though. When your signed integer overflows it's a bug nevertheless. Why force the compiler to generate code for a pointless case instead of letting it optimize the intended one?

If you value never having bugs over performance then just insert a check or run your program with a sanitizer that does that for you. It's a solved problem for a case where performance doesn't matter. The thing is that it does.


That would be great if it was possible, but how do you specify & implement sensible behavior for this:

    void foo(int *a, int b) { a[b] = 1}
At runtime there is no information about whether that write is in bounds and no way to prevent this from corrupting arbitrary data unless you compile for something like CHERI.


In checked languages this would probably be an 'unsafe' function, since it lacks those features.

If this were accessible at build time it could be checked for anything that references the function and bounds checked accordingly.

The promotion of a pointer to an array is really the source of the logical error. A language could place range checks on created arrays, and pointers / references to allocated arrays could be handled differently than anonymous slabs of memory. However an array without bounds (even stored elsewhere from just before the array's starting address) is as unsafe as 'null terminated strings' for length bounds. That's an idea that made much more sense when systems were much smaller and slower and the exposure to untrusted code and data were also far lower.

void foo(void *a, int b) { (int[])(a) = 1 } // Not quite C pseudocode, also see poke()


Good luck defining the behaviour of use after free of accessing out of bound stack memory without bound checking and GC.


They don't happen that often. That's why they're bugs!


> you’d have to insert a check every time you add two signed integers together,

This is exactly what is done in serious code. It is typically combined with contracts and static analysis (often human), e.g. "it is guaranteed that this input is in range 10-20, so adding it with this other 16 bit int can be assumed to be below sint32_max".


Great, those checks can stay in "serious" code, and those of us who don't want them can take the UB. C++ 20 actually ended up specifying that all ints are twos complement, removing this from the category of "UB," but a lot more weird stuff is programmed in C.


Note that signed overflow is still UB in c++ even with 2-complement being guaranteed for signed types.


> because signed overflow is UB

no longer




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: