The compiler should emit the code to add one to a, and then code to check if the result is greater than a. This is completely evident, and is what all C and C++ compilers did for the first few decades of their existence. Maybe a particularly smart compiler could issue a `jo` instead of a `cmp ax, bx; jz `.
The for loop example is silly. There is no reason whatsoever to add an overflow check in a for loop. The code of a standard for loop, `for (int i = 0; i < n; i++)` doesn't say to do any overflow check, so why would the compiler insert one? Not inserting overflow checks is completely different than omitting overflow checks explicitly added in the code. Not to mention, for this type of loop, the compiler doesn't need any UB-based logic to prove that the loop terminates - for any possible value of n, including INT_MAX, this loop will terminate, assuming `i` is not modified elsewhere.
I'd also note that the "most correct" type to use for the iteration variable in a loop used to access an array, per the standard, would be `size_t`, which is an unsigned type, which does allow overflow to happen. The standard for loop should be `for (size_t i = 0; i < n; ++i)`, which doesn't allow the compiler to omit any overflow checks, even if any were present.
The interesting case is what should the code do if inlined on a code path where a is deduced to be INT_MAX.
A compiler will just avoid inlining any code here, since it's not valid, and thus by definition that branch cannot be taken, removing cruft that would impact the instruction cache.
The original code is not invalid, even by the standard. It's not even undefined behavior. It is perfectly well defined as equivalent to `return true` according to the standard, or it can be implemented in the more straightforward way (add one to a, compare the result with a, return the result of the comparison). Both are perfectly valid compilations of this code according to the standard. Both allow inlining the function as well.
Note that also `return 1 < 0` is also perfectly valid code.
The problem related to UB appears if the function is inlined in a situation where a is INT_MAX. That causes the whole branch of code to be UB, and the compiler is allowed to compile the whole context with the assumption that this didn't happen.
For example, the following function can well be compiled to print "not zero":
This is a valid compilation, because stupid(INT_MAX) would be UB, so it can't happen in a valid program. The only way for the program to be valid is for x to never be 0, so the `if` is superfluous and `foo` can be compiled to only have the code where UB can't happen.
Eidt: Now, neither clang nor gcc seem to do this optimization. But if we replace stupid(INT_MAX) with a "worse" kind of UB, say `*(int*)NULL = 1`, then they do indeed compile the function to simply call printf [0].
Functions have parameters. In the case of the previous function, it is not defined if its parameter is INT_MAX, but is defined for all other values of int.
Having functions that are only valid on a subset of the domain defined by the types of their parameters is a commonplace thing, even outside of C.
Yes, a compiler can deduce that a particular code path can be completely elided because the resulting behaviour wasn't defined. There is nothing surprising about this.
The point is that a compiler can notice that one branch of your code leads to UB and elide the whole branch, even eliding code before the UB appears. The way this cascades is very hard to track and understand - in this case, the fact that stupid() is UB when called with INT_MAX makes foo() be UB when called with 0, which can cascade even more.
And no, this doesn't happen in any other commonly-used language. No other commonly-used language has this notion of UB, and certainly not this type of optimization based on deductions made from UB. A Java function that is not well defined over its entire input set will trigger an exception, not cause code calling it with the parameters it doesn't accept to be elided from the executable.
Finally, I should mention that the compiler is not even consistent in its application of this. The signed int overflow UB is not actually used to ellide this code path. But other types of UB, such as null pointer dereference, are.
It is perfectly possible to write a function in pure Java that would never terminate when called with parameter values outside of the domain for which it is defined. It is also possible for it to yield an incorrect value.
Your statement that such a function would throw an exception is false.
Ensuring a function is only called for the domain it is defined on is entirely at the programmer's discretion regardless of language. Some choose to ensure all functions are defined for all possible values, but that's obviously impractical due to combinatorial explosions. Types that encapsulate invariants are typically seen as the solution for this.
I didn't claim that all functions are either correct or throw an exception in Java. I said that UB doesn't exist in Java, in the sense of a Java program that compiles, but for which no semantics are assigned and the programmer is not allowed to write it. All situations that are UB in C or C++ are either well-defined in Java (signed integer overflow, non-terminating loops that don't do IO/touch volatile variables), many others throw exceptions (out of bounds access, divide by 0), and a few are simply not possible (use after free). Another few are what the C++ standard would call "unspecified behavior", such as unsynchronized concurrent access.
And yes, it's the programmer's job to make sure functions are called in their domain of apllication. But it doesn't help at all when the compiler prunes your code-as-written to remove flows that would have reached an error situation, making debugging much harder when you accidentally do call them with illegal values.
If you want the compiler to output exactly the code as written (or as close as possible to it for the target architecture), then most compilers support that. It's called turning off optimizations. You can do that if that's what you want.
Optimizing compilers on the other hand are all about outputting something that is equivalent to your code UNDER THE RULES OF THE LANGUAGE while hopefully being faster. This condition isn't there to fuck you over its there because it is required for the compiler to do more than very very basic optimizations.
> Optimizing compilers on the other hand are all about outputting something that is equivalent to your code UNDER THE RULES OF THE LANGUAGE while hopefully being faster.
The problem here is how far you stretch this "equivalent under the rules of the language" concept. I think many agree that C and C++ compilers have chosen to play language lawyer games to little performance in real world code, but introducing very real bugs.
As it stands today, C and C++ are the only mainstream languages that have non-timing-related bugs in optimized builds that aren't there in debug builds - putting a massive burden on programmers to find and fix these bugs. The performance gain from this is extremely debatable. But what is clear is that you can create very performant code without relying on this type of UB logic.
The for loop example is silly. There is no reason whatsoever to add an overflow check in a for loop. The code of a standard for loop, `for (int i = 0; i < n; i++)` doesn't say to do any overflow check, so why would the compiler insert one? Not inserting overflow checks is completely different than omitting overflow checks explicitly added in the code. Not to mention, for this type of loop, the compiler doesn't need any UB-based logic to prove that the loop terminates - for any possible value of n, including INT_MAX, this loop will terminate, assuming `i` is not modified elsewhere.
I'd also note that the "most correct" type to use for the iteration variable in a loop used to access an array, per the standard, would be `size_t`, which is an unsigned type, which does allow overflow to happen. The standard for loop should be `for (size_t i = 0; i < n; ++i)`, which doesn't allow the compiler to omit any overflow checks, even if any were present.