Hacker News new | past | comments | ask | show | jobs | submit login

I'm not sure how "acting on an assumption that the situation cannot occur" is distinguishable from "choosing to ignore situations completely whenever they come up". The former is a blanket description of how you treat individual instances.

For example:

    int f(int* i) {
        int val = *i;
        if (i == NULL) { return 0; }
        return val;
    }
I submit that there are two situations here:

1. i is NULL. Program flow will be caught by the NULL check and the return value is 0.

2. i is not NULL. The NULL check cannot be hit, and the return value is val.

As allowed by the standard, I'll just ignore the situation with UB, leaving

> i is not NULL. The NULL check cannot be hit, and the return value is val.

Alternatively, I can assume that UB cannot occur, which eliminates option 1, leaving

> i is not NULL. The NULL check cannot be hit, and the return value is val.

You get the same result either way.

And that gets to the root of the problem: What exactly does "ignoring the situation completely" mean? In particular, what is the scope of a "situation"?




Not sure what such an absurd code example is supposed to show.

However, ignoring the situation completely in this case is emitting the code as written. This is not hard at all, despite all the mental gymnastics expended to pretend that it were hard.

That is the compiler's job: emit the code that the programmer wrote. Even if that code is stupid, as in this case. It is not the compiler's job to second guess the programmer and generate the code that it believes the programmer meant to write.

In this case:

   1. Dereference i.
   2. Check i.
   3. If i is null return 0.
   4. return the value from line 1.
If you want, I can also write it down as assembly code.

Again, this isn't hard.

"Ignoring the situation completely" means ignoring the fact that this is UB and just carrying on.


> Not sure what such an absurd code example is supposed to show.

I had a few things in mind:

1. Highlight that it's not the wording change from "permissible" to "possible" that enables UB-based optimization - it's the interpretation of one of the behaviors the standard lists

2. It's the vagueness of "the situation" that's at the heart of the issue.

3. "Ignoring the situation completely" can produce the same result as "assuming UB never happens", contrary to your assertion (albeit subject to an interpretation that you disagree with)

(And absurd or not, similar code does show up in real life [0]. It's part of why -fno-delete-null-pointer-checks exists, after all).

> However, ignoring the situation completely in this case is emitting the code as written.

> "Ignoring the situation completely" means ignoring the fact that this is UB and just carrying on.

Wouldn't "emitting the code as written" or "ignoring the fact that this is UB and just carrying on" fall under the "to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message)"? If it does, why have the "ignoring the situation completely with unpredictable results" wording in the first place?

I'm not really convinced your definition of "ignoring the situation completely" is the only valid definition, either. Why wouldn't the standard's wording allow the interpretation in the example?

> That is the compiler's job: emit the code that the programmer wrote.

Why bother with optimizations, then?

[0]: https://lwn.net/Articles/342330/


>> That is the compiler's job: emit the code that the programmer wrote.

> Why bother with optimizations, then?

Why not?

If you can optimise without compromising the intended semantics of the code, go ahead. If you cannot, do not.

Note: you will have to apply judgement, because the C standard in particular happens to be incomplete.


> Why not?

Because "That is the compiler's job: emit the code that the programmer wrote", and optimizing means the compiler is allowed to emit code the programmer did not write.

> If you can optimise without compromising the intended semantics of the code, go ahead.

And how does the compiler know what the "intended semantics" of the code are, if it isn't precisely what the programmer wrote?

> you will have to apply judgement

How is that compatible with "It is not the compiler's job to second guess the programmer and generate the code that it believes the programmer meant to write"? Applying judgement to guess what the programmer intended sounds exactly like "generating the code that [the compiler] believes the programmer meant to write".


> It is not the compiler's job to second guess the programmer and generate the code that it believes the programmer meant to write.

Not if the compiler is run at some (hopefully available) "absolute vanilla", translate-this-code-literally-into-machine-language-and-do-absolutely-nothing-else settings, no. But if you add some optimise-this or optimise-that or optimise-everything switch? Then second-guessing the programmer and generating the code it believes the programmer meant to write (if only they knew all these nifty optimisation techniques) is exactly the compiler's job.

And isn't that the subject under discussion here; Undefined Behaviour and compiler optimisation?

> "Ignoring the situation completely" means ignoring the fact that this is UB and just carrying on.

Q: How?

A: In an undefined way. (Right, since this is "Undefined Behaviour"?)

So... Whatever the heck happens, happens. It's, like, not defined. If you want it defined (as "Do exactly what I wrote anyway!"), then don't be so hung up on the Undefined Behaviour bit; just switch off all the optimisation bits in stead.


> Not if the compiler is run at some (hopefully available) "absolute vanilla"

Not this straw man again, please!

You can optimise, as long as you keep the semantics of the code.

When what the programmer wrote conflicts with an optimisation the compiler would like to perform:

"- Trust the programmer."

http://port70.net/~nsz/c/c89/rationale/a.html

So if the compiler wanted to put a 4 element array in registers as an optimisation, but the programmer wrote code that accesses element 6, guess what? You can't do that optimisation!

[You can also emit a warning/error if you want]

Also, if your fantastic optimisation breaks tons of existing code, it's no good.

"Existing code is important, existing implementations are not. A large body of C code exists of considerable commercial value. "

http://port70.net/~nsz/c/c89/rationale/a.html

>> "Ignoring the situation completely" means ignoring the fact that this is UB and just carrying on. >Q: How? >A: In an undefined way. (Right, since this is "Undefined Behaviour"?)

Nope. There are bounds. Again, this isn't hard, it says so right there.


> You can optimise, as long as you keep the semantics of the code.

In order to do this, we need to define the semantics. Actually doing this is a much harder exercise than you might think, and largely involves creating a PDP-11 emulator since modern machines simply do not behave like the systems that C exposes. We don't even have a flat buffer of memory to address.

Things get way way worse when you start considering nontraditional architectures and suddenly the only way to match the proposed semantics is to introduce a huge amount of software overhead. For example, we might decide to define signed integer overflow as the normal thing that exists on x86 machines. Now when compiling to a weird architecture that does not have hardware overflow detection you need to introduce software checks at every arithmetic operation so you can match the desired semantics appropriately.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: