He doesn't really address any of the claims made, and makes some startling claim...

aw1621107 · on Jan 21, 2022

> This was made non-binding in later versions of the standard, so optimiser engineers act as if these words don't exist.

This is reminiscent of the argument in "How One Word Broke C" [0, HN discussion at 1]. I'm not particularly convinced this argument is correct. In my opinion, it's "ignoring the situation completely" that's the phrase of interest; after all, what is assuming that UB cannot occur but "ignoring the situation completely"?

I'm a nobody, though, so take that with an appropriate grain of salt.

[0]: https://news.quelsolaar.com/2020/03/16/how-one-word-broke-c/ (currently broken; archive at https://web.archive.org/web/20210307213745/https://news.quel...

[1]: https://news.ycombinator.com/item?id=22589657

mpweiher · on Jan 21, 2022

Apologies, but "ignoring the situation completely" is the exact opposite of "assuming the situation cannot occur and acting on that assumption".

aw1621107 · on Jan 21, 2022

I'm not sure how "acting on an assumption that the situation cannot occur" is distinguishable from "choosing to ignore situations completely whenever they come up". The former is a blanket description of how you treat individual instances.

For example:

    int f(int* i) {
        int val = *i;
        if (i == NULL) { return 0; }
        return val;
    }

I submit that there are two situations here:

1. i is NULL. Program flow will be caught by the NULL check and the return value is 0.

2. i is not NULL. The NULL check cannot be hit, and the return value is val.

As allowed by the standard, I'll just ignore the situation with UB, leaving

> i is not NULL. The NULL check cannot be hit, and the return value is val.

Alternatively, I can assume that UB cannot occur, which eliminates option 1, leaving

> i is not NULL. The NULL check cannot be hit, and the return value is val.

You get the same result either way.

And that gets to the root of the problem: What exactly does "ignoring the situation completely" mean? In particular, what is the scope of a "situation"?

mpweiher · on Jan 22, 2022

Not sure what such an absurd code example is supposed to show.

However, ignoring the situation completely in this case is emitting the code as written. This is not hard at all, despite all the mental gymnastics expended to pretend that it were hard.

That is the compiler's job: emit the code that the programmer wrote. Even if that code is stupid, as in this case. It is not the compiler's job to second guess the programmer and generate the code that it believes the programmer meant to write.

In this case:

   1. Dereference i.
   2. Check i.
   3. If i is null return 0.
   4. return the value from line 1.

If you want, I can also write it down as assembly code.

Again, this isn't hard.

"Ignoring the situation completely" means ignoring the fact that this is UB and just carrying on.

aw1621107 · on Jan 23, 2022

> Not sure what such an absurd code example is supposed to show.

I had a few things in mind:

1. Highlight that it's not the wording change from "permissible" to "possible" that enables UB-based optimization - it's the interpretation of one of the behaviors the standard lists

2. It's the vagueness of "the situation" that's at the heart of the issue.

3. "Ignoring the situation completely" can produce the same result as "assuming UB never happens", contrary to your assertion (albeit subject to an interpretation that you disagree with)

(And absurd or not, similar code does show up in real life [0]. It's part of why -fno-delete-null-pointer-checks exists, after all).

> However, ignoring the situation completely in this case is emitting the code as written.

> "Ignoring the situation completely" means ignoring the fact that this is UB and just carrying on.

Wouldn't "emitting the code as written" or "ignoring the fact that this is UB and just carrying on" fall under the "to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message)"? If it does, why have the "ignoring the situation completely with unpredictable results" wording in the first place?

I'm not really convinced your definition of "ignoring the situation completely" is the only valid definition, either. Why wouldn't the standard's wording allow the interpretation in the example?

> That is the compiler's job: emit the code that the programmer wrote.

Why bother with optimizations, then?

[0]: https://lwn.net/Articles/342330/

mpweiher · on Jan 23, 2022

>> That is the compiler's job: emit the code that the programmer wrote.

> Why bother with optimizations, then?

Why not?

If you can optimise without compromising the intended semantics of the code, go ahead. If you cannot, do not.

Note: you will have to apply judgement, because the C standard in particular happens to be incomplete.

aw1621107 · on Jan 24, 2022

> Why not?

Because "That is the compiler's job: emit the code that the programmer wrote", and optimizing means the compiler is allowed to emit code the programmer did not write.

> If you can optimise without compromising the intended semantics of the code, go ahead.

And how does the compiler know what the "intended semantics" of the code are, if it isn't precisely what the programmer wrote?

> you will have to apply judgement

How is that compatible with "It is not the compiler's job to second guess the programmer and generate the code that it believes the programmer meant to write"? Applying judgement to guess what the programmer intended sounds exactly like "generating the code that [the compiler] believes the programmer meant to write".

CRConrad · on Jan 22, 2022

> It is not the compiler's job to second guess the programmer and generate the code that it believes the programmer meant to write.

Not if the compiler is run at some (hopefully available) "absolute vanilla", translate-this-code-literally-into-machine-language-and-do-absolutely-nothing-else settings, no. But if you add some optimise-this or optimise-that or optimise-everything switch? Then second-guessing the programmer and generating the code it believes the programmer meant to write (if only they knew all these nifty optimisation techniques) is exactly the compiler's job.

And isn't that the subject under discussion here; Undefined Behaviour and compiler optimisation?

> "Ignoring the situation completely" means ignoring the fact that this is UB and just carrying on.

Q: How?

A: In an undefined way. (Right, since this is "Undefined Behaviour"?)

So... Whatever the heck happens, happens. It's, like, not defined. If you want it defined (as "Do exactly what I wrote anyway!"), then don't be so hung up on the Undefined Behaviour bit; just switch off all the optimisation bits in stead.

mpweiher · on Jan 23, 2022

> Not if the compiler is run at some (hopefully available) "absolute vanilla"

Not this straw man again, please!

You can optimise, as long as you keep the semantics of the code.

When what the programmer wrote conflicts with an optimisation the compiler would like to perform:

"- Trust the programmer."

http://port70.net/~nsz/c/c89/rationale/a.html

So if the compiler wanted to put a 4 element array in registers as an optimisation, but the programmer wrote code that accesses element 6, guess what? You can't do that optimisation!

[You can also emit a warning/error if you want]

Also, if your fantastic optimisation breaks tons of existing code, it's no good.

"Existing code is important, existing implementations are not. A large body of C code exists of considerable commercial value. "

http://port70.net/~nsz/c/c89/rationale/a.html

>> "Ignoring the situation completely" means ignoring the fact that this is UB and just carrying on. >Q: How? >A: In an undefined way. (Right, since this is "Undefined Behaviour"?)

Nope. There are bounds. Again, this isn't hard, it says so right there.

UncleMeat · on Jan 23, 2022

> You can optimise, as long as you keep the semantics of the code.

In order to do this, we need to define the semantics. Actually doing this is a much harder exercise than you might think, and largely involves creating a PDP-11 emulator since modern machines simply do not behave like the systems that C exposes. We don't even have a flat buffer of memory to address.

Things get way way worse when you start considering nontraditional architectures and suddenly the only way to match the proposed semantics is to introduce a huge amount of software overhead. For example, we might decide to define signed integer overflow as the normal thing that exists on x86 machines. Now when compiling to a weird architecture that does not have hardware overflow detection you need to introduce software checks at every arithmetic operation so you can match the desired semantics appropriately.

gpderetta · on Jan 21, 2022

sorry, I don't understand, it seems to me that current optimizing compilers are fully compliant with the option of "ignoring the situation completely with unpredictable results". That's how UB exploiting optimizations work: the compiler ignores the possibility that the erroneous case can ever happen and takes into account only all valid states.

mpweiher · on Jan 21, 2022

Apologies, but "ignoring the situation completely with undpredictable results" is the exact opposite of "assuming the situation cannot occur and acting on that assumption".

If my program attempts to write outside the bounds of declared array, ignoring the situation (that this is UB) is letting that write happen, and letting the chips fall where they might.

How is assuming it cannot/must not happen, and then optimising it away because it did happen "ignoring the situation"??

gpderetta · on Jan 21, 2022

So a compiler that is allowed to ignore a situation would still be required to generate code for that situation? I don't even know how that would be possible.

mpweiher · on Jan 22, 2022

It ignores the fact that this is UB.

It is not just "possible", but exceedingly simple to comply with what is written in the standard and generate code in that situation. For example:

Writing beyond the bounds of an array whose length is known is apparently UB.

   int a[4];
   a[2]=2;
   a[6]=10;

To generate code for the last line, use the same algorithm you used to generate code for the second line, just parameterised with 6 instead of 2.

What was impossible about that?

gpderetta · on Jan 22, 2022

If each of a[0...3] is stored only inside registers, which register should the compiler pick for a[6]?

mpweiher · on Jan 22, 2022

Who is forcing the compiler to store fracking arrays in registers?

Once again, “incompatible with weird optimizations I want to do” is not close to the same thing as “not possible”.

If the two are incompatible, it is obviously the weird optimization that has to go.

How do you pass the address of that array to another function?

UncleMeat · on Jan 23, 2022

> Who is forcing the compiler to store fracking arrays in registers?

The people who want their program to be fast. "Just do everything in main memory" means your program will be extremely slow.

mpweiher · on Jan 24, 2022

1. Arrays ≠ Everything. Please stop with the straw men.

2. "Compiler Advances Double Computing Power Every 50 Years, And The Interval Keeps Growing". (i.e. even Probsting was wildly optimistic when it comes to the benefits of compiler research). The benefits of these shenanigans are much less than even the critics thought, never mind what the advocates claim. They are actually small and shrinking.

https://zeux.io/2022/01/08/on-proebstings-law/

gpderetta · on Jan 23, 2022

Sorry, we are going in circle. I thought we were discussing what it means for the compiler to ignore UB.

Also calling scalar replacement of aggregates a weird optimization is very strange.

mpweiher · on Jan 23, 2022

It means ignoring the situation and emitting the code that the programmer wrote. The programmer did not write "put this array in registers". The programmer did write "access element 6 of this array".

This isn't hard.

"Keep the spirit of C. The Committee kept as a major goal to preserve the traditional spirit of C. There are many facets of the spirit of C, but the essence is a community sentiment of the underlying principles upon which the C language is based. Some of the facets of the spirit of C can be summarized in phrases like

- Trust the programmer.

- Don't prevent the programmer from doing what needs to be done.

- Keep the language small and simple.

- Provide only one way to do an operation.

- Make it fast, even if it is not guaranteed to be portable.

The last proverb needs a little explanation. The potential for efficient code generation is one of the most important strengths of C. To help ensure that no code explosion occurs for what appears to be a very simple operation, many operations are defined to be how the target machine's hardware does it rather than by a general abstract rule."

http://port70.net/~nsz/c/c89/rationale/a.html

> calling scalar replacement of aggregates a weird optimization is very strange.

No it's not, certainly for arrays. It's a little less weird for structs than it is for arrays. But the fact that you consider calling it weird "very strange" is telling, and to me the heart of this disconnect.

The Optimiser-über-alles community feels that the needs of the optimiser far outweigh the explicit instructions of the programmer. Many programmers beg to differ.

gpderetta · on Jan 24, 2022

At this point it is obvious that you wouldn't accept anything except a trivial translation to ASM. That's a perfectly reasonable thing to want, but you'll also understand that's not what 99.99% of C and C++ programmers want; even assuming (but not conceding) that might have been the original intent of the language, 50 year later user expectations have changed and there is no reason compilers authors should be bound by some questionable dogma.

CRConrad · on Jan 22, 2022

> Apologies, but "ignoring the situation completely with undpredictable results" is the exact opposite of "assuming the situation cannot occur and acting on that assumption".

Seems to me it's not the opposite but the exact same thing.

mpweiher · on Jan 23, 2022

Not sure how you can say that.

   int a[4];
   a[2]=2;
   a[6]=3;

"Ignore the situation": emit the code the code for a[6]=3; in the same way you emitted the code for a[2]=2. You've ignored the fact that this is UB.

"Assume the situation cannot occur": don't know, but according to the UB extremists the compiler can now do anything it wants to do, including not emitting any code for this fragment at all (which appears to happen) or formatting your hard disk (which doesn't, last I checked).

Assuming that a[6]=3 does not occur, despite the fact that it is written right there, would also allow putting a into registers, which "ignoring the situation" would not.

SAI_Peregrinus · on Jan 26, 2022

To me, the "situation" is that `a[6]` is being assigned to. So I'd make the compiler ignore the situation, and omit line 3.

Then the compiler could optimize the remaining program, possibly by putting `a` into registers.

I think you have a different view on what the "situation" is, so that your view of ignoring it is different from mine (and CRConrads, etc).