> > Signed integer expression simplification [...]
> This is easily addressed by allowing (but not requiring) implementations to keep wider intermediate results on signed overflow, to a unspecified width at least as wide as the operands.
> > Value range calculations [...] Loop analysis and optimization [...]
> Allowed but not required to retain excess bits on signed overflow.
What does that even mean? That the code would have different behavior with a different compiler, different optimizations or when you slightly change it (depending on whether the compiler chooses to keep wider intermediate results or not)?
If understand you correctly, then depending on the values of x and y, `(x+1)<(y+3)` would have a different result than `(x+a)<(y+b)` when a=1 and b=3, because in the first case you would be simplifying the expression but in the second case you couldn't.
That would be quite surprising, to say the least.
> No, they're done using size_t; that's what size_t is for.
for (int i = 0; i < 256; i++)
a[i] = b[i] + c[i];
So you've never seen code like this? Not all arrays are huge and for small indices people usually go for `int`.
Also, when doing arithmetic with indices, you might need to represent a negative index, so `size_t` wouldn't work. You'd need to use `ssize_t`, which is a signed integer which would benefit from these optimizations.
But even then you might use an `int` if you know the arithmetic result will fit.
> No, the fact that i (of type size_t) < length of a <= size of a in bytes <= SIZE_MAX ensures that a[i] and a[i+1] are adjacent.
Not if `i` is a signed integer, say, `int` or `int8_t`. Which is the point of the optimization.
> depending on the values of x and y, `(x+1)<(y+3)` would have a different result than `(x+a)<(y+b)` when a=1 and b=3
If x > TYPEOFX_MAX or y > TYPEOFY_MAX-2, then this can already happen with the more (maximally) vague "signed overflow is completely undefined" policy; wider intermediates just mean that code is not allowed to do other things, like crash or make demons fly out of your nose.
If x+[a or 1] and y+[3 or b] don't overflow, then computing them in a wider signed integer type has no effect, since the same values are produced as in the original type.
More generally, retained overflow bits / wider intermediates (instead of undefined behaviour) mean that when you would have gotten undefined behaviour due to integer overflow, you instead get a partially-undefined value with a smaller blast radius (and hence less opprotunity for a technically-standards-conformant compiler to insert security vulnerabilities or other insidious bugs). In cases where you would not have gotten undefined behaviour, there is no signed integer overflow, so the values you get are not partially-undefined, and work the same way as in the signed-overflow-is-undefined-behaviour model. ... you know what; table:
signed overflow is \ no overflow unsigned overflow signed overflow
undefined behaviour: correct result truncated mod 2^n your sinus cavity is a demon roost
wide intermediates: correct result truncated mod 2^n one of a predictable few probably-incorrect results
> If x > TYPEOFX_MAX or y > TYPEOFY_MAX-2, then this can already happen with the more (maximally) vague "signed overflow is completely undefined" policy; wider intermediates just mean that code is not allowed to do other things, like crash or make demons fly out of your nose.
Yes, but saying "signed overflow is completely undefined" simply means that you are not allowed to do that, so this is a very well-defined policy and as an experienced programmer you know what to expect and what code patterns to avoid (hopefully).
If you say "signed overflow is allowed" but then your code behaves nondeterministically (i.e. giving different results when signed overflow happens, depending on which compiler and optimization level you're using or exact code you've written or slightly changed), I would argue that would actually be more surprising for an experienced programmer, not less!
It would make such signed overflow bugs even harder to detect and fix! As it would work just fine for some cases (or when certain optimizations are applied or not, or when you use a certain compiler version or Linux distro) but then it would completely break in a slightly different configuration or if you slightly changed the code.
And it would prevent tools like UBSan from working to detect such bugs because some code would actually be correct and rely on the signed overflow behavior that you've defined, so you couldn't just warn the programmer that a signed overflow happened, as that would generate a bunch of false alarms (especially when such signed-overflow-relying code was part of widely used libraries).
> More generally, retained overflow bits / wider intermediates (instead of undefined behaviour) mean that when you would have gotten undefined behaviour due to integer overflow, you instead get a partially-undefined value with a smaller blast radius (and hence less opprotunity for a technically-standards-conformant compiler to insert security vulnerabilities or other insidious bugs).
C compilers are already allowed to do what you say, currently. But I'm not sure that relying on that behavior would be a good idea :)
I think it's preferable that the C standard says that you are not allowed to overflow signed integers, because otherwise the subtlety of what happens on signed overflow would be lost on most programmers and it would be very hard to catch such bugs, especially due to code behaving differently on slightly different configurations (hello, heisenbugs!) and also because bug-detection tools couldn't flag signed overflows as invalid anymore.
> saying "signed overflow is completely undefined" simply means that you are not allowed to do that
No, saying "signed overflow is a compile-time error" means you're not allowed to do that. Saying "signed overflow is completely undefined" means you are allowed to do that, but it will blow up in your face (or, more likely, your users' faces) with no warning, potentially long after the original erroneous code change that introduced it.
> As it would work just fine for some cases (or when certain optimizations are applied or not, or when you use a certain compiler version or Linux distro) but then it would completely break in a slightly different configuration or if you slightly changed the code.
That sentence is literally indistinguishable from a verbatim quote about the problems with undefined behaviour. Narrowing the scope of possible consequences of signed overflow from "anything whatsoever" to "the kind of problems that are kind of reasonable to have as a result of optimization" is a strict improvement.
> And it would prevent tools like UBSan from working
> bug-detection tools couldn't flag signed overflows as invalid
The standard says `if(x = 5)` is well-defined, but that doesn't stop every competently-designed (non-minimal, correctly configured[0]) compiler from spitting out something to the effect of "error: assignment used as truth value".
0: Arguably a fully competently designed compiler would require you to actually ask for -Wno-error rather than having it be the default, but backward compatibility prevents changing that after the fact, and it would require configuring -Wall-v1.2.3 (so build scripts didn't break) anyway.
> > saying "signed overflow is completely undefined" simply means that you are not allowed to do that
> No, saying "signed overflow is a compile-time error" means you're not allowed to do that.
In the vast majority of cases it's not possible to statically determine if signed overflow will occur, so compilers can't do that. I'm sure they would do it, if it were possible.
> Saying "signed overflow is completely undefined" means you are allowed to do that, but it will blow up in your face
No, it does not mean that. You're not allowed to do signed overflow in standard-compliant C, period.
You're allowed to do signed arithmetic, but the arithmetic is not allowed to overflow. You can write code that overflows, but it will not have defined semantics (because that's what the standard says).
And the compiler cannot enforce or emit a warning when overflow occurs because in the general case it's not possible to statically determine if it will occur.
But if the compiler can determine it, then it will emit a warning (at least with -Wall, I think).
And if you pass the `-ftrapv` flag to GCC (and clang, probably), then your code will deterministically fail at runtime if you do signed overflow, but for performance reasons this is not required by the standard.
> > As it would work just fine for some cases (or when certain optimizations are applied or not, or when you use a certain compiler version or Linux distro) but then it would completely break in a slightly different configuration or if you slightly changed the code.
> That sentence is literally indistinguishable from a verbatim quote about the problems with undefined behaviour. Narrowing the scope of possible consequences of signed overflow from "anything whatsoever" to "the kind of problems that are kind of reasonable to have as a result of optimization" is a strict improvement.
No, because experienced programmers don't expect signed overflow to work, because it's not allowed. Such bad code would be caught if you enable UBsan. But if the C standard would require what you propose, then UBsan could not fail when a signed overflow occurs, as that could be a false positive (and therefore make such signed-overflow detection useless).
If you allow signed overflows then you have to define the semantics. And nondeterministic semantics in arithmetic is prone to result in well-defined but buggy code, while in this case also preventing bug-detection tools from being reliable.
That said, you could conceivably implement such semantics in a compiler, which you could enable with some flag, like for example, -fwrapv causes signed overflow to wraparound and -ftrapv causes signed overflow to fail with an exception.
So you could implement a compiler flag which does what you want, today, and get all the benefits that you're proposing.
You could even enable it by default, because the C standard allows the behavior that you're proposing, so you would not be breaking any existing code.
And this would also mean that existing and future C code would still be C-standard compliant (as long as it doesn't rely on that behavior).
But making the C standard require that behavior means that there will be well-defined, standard-compliant code that will rely on those weird semantics when signed overflow occurs, and that's a really bad idea.
> > bug-detection tools couldn't flag signed overflows as invalid
> The standard says `if(x = 5)` is well-defined, but that doesn't stop every competently-designed (non-minimal, correctly configured[0]) compiler from spitting out something to the effect of "error: assignment used as truth value".
The big difference in this case is that such errors are easy to determine at compile-time. No compiler would cause such code to fail at run-time, because it would lead to unexpected and unusual program crashes, which would result in both users and programmers to be mad (especially if the code is correct). But for signed overflows, it's not possible to implement a similar compile-time error.
As another example, AddressSanitizer is competently-designed but as soon as you enable `strict_string_checks` you will run into false positives if your code stores strings by keeping a pointer and a length, rather than forcing them to be NULL-terminated, so that flag can be completely useless (and for my current project, it is).
Which is why I'm guessing it's disabled by default. Which means almost nobody uses that flag.
This happens because strings are not actually required to be NULL-terminated in C, even though most people use them that way. So there is code out there (including mine) that relies on strings not always being NULL terminated, and this has well-defined semantics in C.
But of course, as soon as that happens, then you can't rely on the tool to be useful anymore, because there is perfectly fine code which relies on the well-defined semantics.
Note that in this case, "strict_string_checks" is a run-time check, like detection of signed overflows would have to be.
> This is easily addressed by allowing (but not requiring) implementations to keep wider intermediate results on signed overflow, to a unspecified width at least as wide as the operands.
> > Value range calculations [...] Loop analysis and optimization [...]
> Allowed but not required to retain excess bits on signed overflow.
What does that even mean? That the code would have different behavior with a different compiler, different optimizations or when you slightly change it (depending on whether the compiler chooses to keep wider intermediate results or not)?
If understand you correctly, then depending on the values of x and y, `(x+1)<(y+3)` would have a different result than `(x+a)<(y+b)` when a=1 and b=3, because in the first case you would be simplifying the expression but in the second case you couldn't.
That would be quite surprising, to say the least.
> No, they're done using size_t; that's what size_t is for.
So you've never seen code like this? Not all arrays are huge and for small indices people usually go for `int`.Also, when doing arithmetic with indices, you might need to represent a negative index, so `size_t` wouldn't work. You'd need to use `ssize_t`, which is a signed integer which would benefit from these optimizations.
But even then you might use an `int` if you know the arithmetic result will fit.
> No, the fact that i (of type size_t) < length of a <= size of a in bytes <= SIZE_MAX ensures that a[i] and a[i+1] are adjacent.
Not if `i` is a signed integer, say, `int` or `int8_t`. Which is the point of the optimization.