Hacker News new | past | comments | ask | show | jobs | submit login

The original code is not invalid, even by the standard. It's not even undefined behavior. It is perfectly well defined as equivalent to `return true` according to the standard, or it can be implemented in the more straightforward way (add one to a, compare the result with a, return the result of the comparison). Both are perfectly valid compilations of this code according to the standard. Both allow inlining the function as well.

Note that also `return 1 < 0` is also perfectly valid code.

The problem related to UB appears if the function is inlined in a situation where a is INT_MAX. That causes the whole branch of code to be UB, and the compiler is allowed to compile the whole context with the assumption that this didn't happen.

For example, the following function can well be compiled to print "not zero":

  int foo(int x) {
    if (x == 0) {
      return stupid(INT_MAX);
    } else {
      printf("not zero");
      return -1;
    } 
  }

  foo(0); //prints "not zero"
This is a valid compilation, because stupid(INT_MAX) would be UB, so it can't happen in a valid program. The only way for the program to be valid is for x to never be 0, so the `if` is superfluous and `foo` can be compiled to only have the code where UB can't happen.

Eidt: Now, neither clang nor gcc seem to do this optimization. But if we replace stupid(INT_MAX) with a "worse" kind of UB, say `*(int*)NULL = 1`, then they do indeed compile the function to simply call printf [0].

[0] https://godbolt.org/z/McWddjevc




I don't know what you're ranting on about.

Functions have parameters. In the case of the previous function, it is not defined if its parameter is INT_MAX, but is defined for all other values of int.

Having functions that are only valid on a subset of the domain defined by the types of their parameters is a commonplace thing, even outside of C.

Yes, a compiler can deduce that a particular code path can be completely elided because the resulting behaviour wasn't defined. There is nothing surprising about this.


The point is that a compiler can notice that one branch of your code leads to UB and elide the whole branch, even eliding code before the UB appears. The way this cascades is very hard to track and understand - in this case, the fact that stupid() is UB when called with INT_MAX makes foo() be UB when called with 0, which can cascade even more.

And no, this doesn't happen in any other commonly-used language. No other commonly-used language has this notion of UB, and certainly not this type of optimization based on deductions made from UB. A Java function that is not well defined over its entire input set will trigger an exception, not cause code calling it with the parameters it doesn't accept to be elided from the executable.

Finally, I should mention that the compiler is not even consistent in its application of this. The signed int overflow UB is not actually used to ellide this code path. But other types of UB, such as null pointer dereference, are.


It is perfectly possible to write a function in pure Java that would never terminate when called with parameter values outside of the domain for which it is defined. It is also possible for it to yield an incorrect value.

Your statement that such a function would throw an exception is false.

Ensuring a function is only called for the domain it is defined on is entirely at the programmer's discretion regardless of language. Some choose to ensure all functions are defined for all possible values, but that's obviously impractical due to combinatorial explosions. Types that encapsulate invariants are typically seen as the solution for this.


I didn't claim that all functions are either correct or throw an exception in Java. I said that UB doesn't exist in Java, in the sense of a Java program that compiles, but for which no semantics are assigned and the programmer is not allowed to write it. All situations that are UB in C or C++ are either well-defined in Java (signed integer overflow, non-terminating loops that don't do IO/touch volatile variables), many others throw exceptions (out of bounds access, divide by 0), and a few are simply not possible (use after free). Another few are what the C++ standard would call "unspecified behavior", such as unsynchronized concurrent access.

And yes, it's the programmer's job to make sure functions are called in their domain of apllication. But it doesn't help at all when the compiler prunes your code-as-written to remove flows that would have reached an error situation, making debugging much harder when you accidentally do call them with illegal values.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: