Hacker News new | past | comments | ask | show | jobs | submit login

He doesn't really address any of the claims made, and makes some startling claims himself. For example:

> This example demonstrates that even ICC with -O1 already requires unrestricted UB.

The example demonstrates nothing of the sort. It demonstrates that ICC uses unrestricted undefined behaviour, not that this is required in any way shape or form. (The only way the word "requires" is reasonable here is that the behavior seen "requires" use of this kind of UB to be present. But that's something very different, and doesn't match with the rest of his use).

> writing C has become incredibly hard since undefined behavior is so difficult to avoid

No, it has become difficult because compilers exploit UB in insane ways. The platform specific UB that he claims is "not an option" is, incidentally, exactly how UB is defined in the standard:

Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

This was made non-binding in later versions of the standard, so optimiser engineers act as if these words don't exist.

And of course there is no reason given for this interpretation being "not an option". Except for "but I wanna". Yes, it's not an option if you really want to exploit UB in the insane ways that compilers these days want to exploit it.

But that's not necessary in any way shape or form.

> That would declare all but the most basic C compilers as non-compliant.

Yes, because by any sane interpretation of the standard they are non-compliant.

On a more general note, I find this idea that things are necessary because I want to do them really bizarre.




> This was made non-binding in later versions of the standard, so optimiser engineers act as if these words don't exist.

This is reminiscent of the argument in "How One Word Broke C" [0, HN discussion at 1]. I'm not particularly convinced this argument is correct. In my opinion, it's "ignoring the situation completely" that's the phrase of interest; after all, what is assuming that UB cannot occur but "ignoring the situation completely"?

I'm a nobody, though, so take that with an appropriate grain of salt.

[0]: https://news.quelsolaar.com/2020/03/16/how-one-word-broke-c/ (currently broken; archive at https://web.archive.org/web/20210307213745/https://news.quel...

[1]: https://news.ycombinator.com/item?id=22589657


Apologies, but "ignoring the situation completely" is the exact opposite of "assuming the situation cannot occur and acting on that assumption".


I'm not sure how "acting on an assumption that the situation cannot occur" is distinguishable from "choosing to ignore situations completely whenever they come up". The former is a blanket description of how you treat individual instances.

For example:

    int f(int* i) {
        int val = *i;
        if (i == NULL) { return 0; }
        return val;
    }
I submit that there are two situations here:

1. i is NULL. Program flow will be caught by the NULL check and the return value is 0.

2. i is not NULL. The NULL check cannot be hit, and the return value is val.

As allowed by the standard, I'll just ignore the situation with UB, leaving

> i is not NULL. The NULL check cannot be hit, and the return value is val.

Alternatively, I can assume that UB cannot occur, which eliminates option 1, leaving

> i is not NULL. The NULL check cannot be hit, and the return value is val.

You get the same result either way.

And that gets to the root of the problem: What exactly does "ignoring the situation completely" mean? In particular, what is the scope of a "situation"?


Not sure what such an absurd code example is supposed to show.

However, ignoring the situation completely in this case is emitting the code as written. This is not hard at all, despite all the mental gymnastics expended to pretend that it were hard.

That is the compiler's job: emit the code that the programmer wrote. Even if that code is stupid, as in this case. It is not the compiler's job to second guess the programmer and generate the code that it believes the programmer meant to write.

In this case:

   1. Dereference i.
   2. Check i.
   3. If i is null return 0.
   4. return the value from line 1.
If you want, I can also write it down as assembly code.

Again, this isn't hard.

"Ignoring the situation completely" means ignoring the fact that this is UB and just carrying on.


> Not sure what such an absurd code example is supposed to show.

I had a few things in mind:

1. Highlight that it's not the wording change from "permissible" to "possible" that enables UB-based optimization - it's the interpretation of one of the behaviors the standard lists

2. It's the vagueness of "the situation" that's at the heart of the issue.

3. "Ignoring the situation completely" can produce the same result as "assuming UB never happens", contrary to your assertion (albeit subject to an interpretation that you disagree with)

(And absurd or not, similar code does show up in real life [0]. It's part of why -fno-delete-null-pointer-checks exists, after all).

> However, ignoring the situation completely in this case is emitting the code as written.

> "Ignoring the situation completely" means ignoring the fact that this is UB and just carrying on.

Wouldn't "emitting the code as written" or "ignoring the fact that this is UB and just carrying on" fall under the "to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message)"? If it does, why have the "ignoring the situation completely with unpredictable results" wording in the first place?

I'm not really convinced your definition of "ignoring the situation completely" is the only valid definition, either. Why wouldn't the standard's wording allow the interpretation in the example?

> That is the compiler's job: emit the code that the programmer wrote.

Why bother with optimizations, then?

[0]: https://lwn.net/Articles/342330/


>> That is the compiler's job: emit the code that the programmer wrote.

> Why bother with optimizations, then?

Why not?

If you can optimise without compromising the intended semantics of the code, go ahead. If you cannot, do not.

Note: you will have to apply judgement, because the C standard in particular happens to be incomplete.


> Why not?

Because "That is the compiler's job: emit the code that the programmer wrote", and optimizing means the compiler is allowed to emit code the programmer did not write.

> If you can optimise without compromising the intended semantics of the code, go ahead.

And how does the compiler know what the "intended semantics" of the code are, if it isn't precisely what the programmer wrote?

> you will have to apply judgement

How is that compatible with "It is not the compiler's job to second guess the programmer and generate the code that it believes the programmer meant to write"? Applying judgement to guess what the programmer intended sounds exactly like "generating the code that [the compiler] believes the programmer meant to write".


> It is not the compiler's job to second guess the programmer and generate the code that it believes the programmer meant to write.

Not if the compiler is run at some (hopefully available) "absolute vanilla", translate-this-code-literally-into-machine-language-and-do-absolutely-nothing-else settings, no. But if you add some optimise-this or optimise-that or optimise-everything switch? Then second-guessing the programmer and generating the code it believes the programmer meant to write (if only they knew all these nifty optimisation techniques) is exactly the compiler's job.

And isn't that the subject under discussion here; Undefined Behaviour and compiler optimisation?

> "Ignoring the situation completely" means ignoring the fact that this is UB and just carrying on.

Q: How?

A: In an undefined way. (Right, since this is "Undefined Behaviour"?)

So... Whatever the heck happens, happens. It's, like, not defined. If you want it defined (as "Do exactly what I wrote anyway!"), then don't be so hung up on the Undefined Behaviour bit; just switch off all the optimisation bits in stead.


> Not if the compiler is run at some (hopefully available) "absolute vanilla"

Not this straw man again, please!

You can optimise, as long as you keep the semantics of the code.

When what the programmer wrote conflicts with an optimisation the compiler would like to perform:

"- Trust the programmer."

http://port70.net/~nsz/c/c89/rationale/a.html

So if the compiler wanted to put a 4 element array in registers as an optimisation, but the programmer wrote code that accesses element 6, guess what? You can't do that optimisation!

[You can also emit a warning/error if you want]

Also, if your fantastic optimisation breaks tons of existing code, it's no good.

"Existing code is important, existing implementations are not. A large body of C code exists of considerable commercial value. "

http://port70.net/~nsz/c/c89/rationale/a.html

>> "Ignoring the situation completely" means ignoring the fact that this is UB and just carrying on. >Q: How? >A: In an undefined way. (Right, since this is "Undefined Behaviour"?)

Nope. There are bounds. Again, this isn't hard, it says so right there.


> You can optimise, as long as you keep the semantics of the code.

In order to do this, we need to define the semantics. Actually doing this is a much harder exercise than you might think, and largely involves creating a PDP-11 emulator since modern machines simply do not behave like the systems that C exposes. We don't even have a flat buffer of memory to address.

Things get way way worse when you start considering nontraditional architectures and suddenly the only way to match the proposed semantics is to introduce a huge amount of software overhead. For example, we might decide to define signed integer overflow as the normal thing that exists on x86 machines. Now when compiling to a weird architecture that does not have hardware overflow detection you need to introduce software checks at every arithmetic operation so you can match the desired semantics appropriately.


sorry, I don't understand, it seems to me that current optimizing compilers are fully compliant with the option of "ignoring the situation completely with unpredictable results". That's how UB exploiting optimizations work: the compiler ignores the possibility that the erroneous case can ever happen and takes into account only all valid states.


Apologies, but "ignoring the situation completely with undpredictable results" is the exact opposite of "assuming the situation cannot occur and acting on that assumption".

If my program attempts to write outside the bounds of declared array, ignoring the situation (that this is UB) is letting that write happen, and letting the chips fall where they might.

How is assuming it cannot/must not happen, and then optimising it away because it did happen "ignoring the situation"??


So a compiler that is allowed to ignore a situation would still be required to generate code for that situation? I don't even know how that would be possible.


It ignores the fact that this is UB.

It is not just "possible", but exceedingly simple to comply with what is written in the standard and generate code in that situation. For example:

Writing beyond the bounds of an array whose length is known is apparently UB.

   int a[4];
   a[2]=2;
   a[6]=10;
To generate code for the last line, use the same algorithm you used to generate code for the second line, just parameterised with 6 instead of 2.

What was impossible about that?


If each of a[0...3] is stored only inside registers, which register should the compiler pick for a[6]?


Who is forcing the compiler to store fracking arrays in registers?

Once again, “incompatible with weird optimizations I want to do” is not close to the same thing as “not possible”.

If the two are incompatible, it is obviously the weird optimization that has to go.

How do you pass the address of that array to another function?


> Who is forcing the compiler to store fracking arrays in registers?

The people who want their program to be fast. "Just do everything in main memory" means your program will be extremely slow.


1. Arrays ≠ Everything. Please stop with the straw men.

2. "Compiler Advances Double Computing Power Every 50 Years, And The Interval Keeps Growing". (i.e. even Probsting was wildly optimistic when it comes to the benefits of compiler research). The benefits of these shenanigans are much less than even the critics thought, never mind what the advocates claim. They are actually small and shrinking.

https://zeux.io/2022/01/08/on-proebstings-law/


Sorry, we are going in circle. I thought we were discussing what it means for the compiler to ignore UB.

Also calling scalar replacement of aggregates a weird optimization is very strange.


It means ignoring the situation and emitting the code that the programmer wrote. The programmer did not write "put this array in registers". The programmer did write "access element 6 of this array".

This isn't hard.

"Keep the spirit of C. The Committee kept as a major goal to preserve the traditional spirit of C. There are many facets of the spirit of C, but the essence is a community sentiment of the underlying principles upon which the C language is based. Some of the facets of the spirit of C can be summarized in phrases like

- Trust the programmer.

- Don't prevent the programmer from doing what needs to be done.

- Keep the language small and simple.

- Provide only one way to do an operation.

- Make it fast, even if it is not guaranteed to be portable.

The last proverb needs a little explanation. The potential for efficient code generation is one of the most important strengths of C. To help ensure that no code explosion occurs for what appears to be a very simple operation, many operations are defined to be how the target machine's hardware does it rather than by a general abstract rule."

http://port70.net/~nsz/c/c89/rationale/a.html

> calling scalar replacement of aggregates a weird optimization is very strange.

No it's not, certainly for arrays. It's a little less weird for structs than it is for arrays. But the fact that you consider calling it weird "very strange" is telling, and to me the heart of this disconnect.

The Optimiser-über-alles community feels that the needs of the optimiser far outweigh the explicit instructions of the programmer. Many programmers beg to differ.


At this point it is obvious that you wouldn't accept anything except a trivial translation to ASM. That's a perfectly reasonable thing to want, but you'll also understand that's not what 99.99% of C and C++ programmers want; even assuming (but not conceding) that might have been the original intent of the language, 50 year later user expectations have changed and there is no reason compilers authors should be bound by some questionable dogma.


> Apologies, but "ignoring the situation completely with undpredictable results" is the exact opposite of "assuming the situation cannot occur and acting on that assumption".

Seems to me it's not the opposite but the exact same thing.


Not sure how you can say that.

   int a[4];
   a[2]=2;
   a[6]=3;
"Ignore the situation": emit the code the code for a[6]=3; in the same way you emitted the code for a[2]=2. You've ignored the fact that this is UB.

"Assume the situation cannot occur": don't know, but according to the UB extremists the compiler can now do anything it wants to do, including not emitting any code for this fragment at all (which appears to happen) or formatting your hard disk (which doesn't, last I checked).

Assuming that a[6]=3 does not occur, despite the fact that it is written right there, would also allow putting a into registers, which "ignoring the situation" would not.


To me, the "situation" is that `a[6]` is being assigned to. So I'd make the compiler ignore the situation, and omit line 3.

Then the compiler could optimize the remaining program, possibly by putting `a` into registers.

I think you have a different view on what the "situation" is, so that your view of ignoring it is different from mine (and CRConrads, etc).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: