For your first example, I think most people want integer overflow to be unspecified behavior instead of undefined behavior - this is how most other languages treat it, it is how all C compilers behaved for a long time, and it is unreasonably difficult to actually write C code that makes sure not to cause integer overflow.
Your example is in fact perfect for why that should be the case: consider the following code:
int n = 0;
scanf("%d", &n)
factorial(n); //using your definition of factorial(int)
A standard-compliant C compiler is apparently free to optimize this program to format your hard-drive.
For your second example, I would say that, rather than omitting the NULL check, a helpful compiler could do one of two things:
1. Don't reason from strlen(s) to s != NULL, so the NULL check must be left in (unless it has even more context and can see a non-NULL assignment to s)
2. Wherever trying to optimize away a comparison to NULL based on UB reasoning, put in a new comparison to NULL at the place you applied that reasoning. For this example, optimize the program as if it looked like this:
//first, the inline version
void puts_with_len(const char *s) {
s_len = strlen(s);
printf("len = %zu\n", s_len);
const char* puts_arg = s == NULL ? "(null)" : s
puts(puts_arg)
}
//after optimization
void puts_with_len(const char *s) {
s_len = strlen(s);
const char* puts_arg;
if (s != NULL) {
puts_arg = s;
} else {
puts_arg = "(null)"; //could also explicitly signal a segfault or assert here instead
}
printf("len = %zu\n", s_len);
puts(puts_arg);
}
In this case we didn't gain anything, but if puts_with_len were itself inlined the check would be moved further back, potentially replacing many NULL checks with a single one.
I would note that there is a third option here that goes in a different direction: now that compilers are extremely aggressive with NULL check removal optimizations, a lot of unsafe C functions could be made safe by manually adding the missing NULL checks to the stdlib and other major libraries. This wouldn't affect the semantics, and it wouldn't hurt performance assuming the optimizer really is doing its business.
For example, strlen() could itself raise an assertion/exit on strlen(NULL). If called from a context where it is known that s != NULL, the null check can be optimized away by the aggressive optimizer; if not, better safe than sorry.
If signed integer overflow is implementation defined rather than undefined then it isn’t an error and we cannot make compiler features that warn or reject when we can prove it will occur. In your case we’ve managed to get the worst of both worlds (a buggy program and no capacity for the compiler to stop you).
For a long time in C's history, for most platforms, int overflow was actually treated as well defined behavior, with many examples suggesting to use tests like x + y < x (assuming positive integers) to detect overflow.
In modern C there is simply no portable way to easily check for integer overflow for 64-bit values, even though the vast majority of programs are running on a processor that defines exactly what happens with integer overflow, and even sets a flag that can be tested for in a single jump.
People often cite for loops over arrays as an example of places where treating integer overflow as UB helps with optimizations. This despite the fact that the recommended, standards compliant portable way to iterate over the range of indices in an array is to use a size_t index variable, which is an unsigned type.
> even though the vast majority of programs are running on a processor that defines exactly what happens with integer overflow, and even sets a flag that can be tested for in a single jump
Widths matter. Platforms that do this don't overflow both 32 and 64 bit signed integers. So if you want to define signed integer overflow for all signed integer widths then for one (or both) of these widths you need to stick in runtime checks.
Your example is in fact perfect for why that should be the case: consider the following code:
A standard-compliant C compiler is apparently free to optimize this program to format your hard-drive.For your second example, I would say that, rather than omitting the NULL check, a helpful compiler could do one of two things:
1. Don't reason from strlen(s) to s != NULL, so the NULL check must be left in (unless it has even more context and can see a non-NULL assignment to s) 2. Wherever trying to optimize away a comparison to NULL based on UB reasoning, put in a new comparison to NULL at the place you applied that reasoning. For this example, optimize the program as if it looked like this:
In this case we didn't gain anything, but if puts_with_len were itself inlined the check would be moved further back, potentially replacing many NULL checks with a single one.I would note that there is a third option here that goes in a different direction: now that compilers are extremely aggressive with NULL check removal optimizations, a lot of unsafe C functions could be made safe by manually adding the missing NULL checks to the stdlib and other major libraries. This wouldn't affect the semantics, and it wouldn't hurt performance assuming the optimizer really is doing its business.
For example, strlen() could itself raise an assertion/exit on strlen(NULL). If called from a context where it is known that s != NULL, the null check can be optimized away by the aggressive optimizer; if not, better safe than sorry.