This bug has nothing to do with undefined behavior, as many of the other comment...

rocqua · on Aug 28, 2019

Only up to a point. This bug is a misfiring attempt at detecting undefined behavior to allow for optimizations. If we did not attempt to remove null pointer checks, that would have prevented this bug in GCC.

Hence, the practice of optimizing based on UB is relevant here, even if this was a compiler bug, rather than a compiler optimizing based on UB.

jcranmer · on Aug 28, 2019

> This bug is a misfiring attempt at detecting undefined behavior to allow for optimizations.

It's closer to a conditional constant propagation that was accidentally applied unconditionally. The code was guarded by an if statement that asserted that a variable did not have a particular value, which lets the compiler delete further checks that the variable does or does not have the value. No UB behavior in play, unless you're complaining about the UB of "variables can't be changed unless the programmer changes them directly or indirectly."

If you pay attention to the change that introduced the bug--it was a change in the constant propagator that caused the bug.

saagarjha · on Aug 28, 2019

> This bug is a misfiring attempt at detecting undefined behavior to allow for optimizations.

I don't think it is. It's the same as removing the conditional here:

  int foo = 1;
  if (foo) {
      // whatever
  }

The compiler (incorrectly) deduced that the pointer was non-NULL and removed the check because it saw it as unnecessary.

temac · on Aug 28, 2019

I'm curious why you think that? The origin is absolutely the usual exploitation of UB in the sense that the compiler supposes that code paths that are past one can be elided (because programmers never write bugs, that kind of interesting reasoning...)

Except it has a bug (the irony :p ) in the logic that computes that, and actually elides more code paths than it should. That shows that their architecture to do that kind of dangerous transformation is fragile, and they are not able to guarantee that that kind of transformation is really restricted to buggy source code (which would even then still be a bad idea IMO, but that is another story).

saagarjha · on Aug 28, 2019

There's no undefined behavior, though. The code provided is well formed: it passes NULL to a function that checks for it, and the NULL check is being removed erroneously because GCC thinks it's not necessary. Here's an explanation of the bug: https://news.ycombinator.com/item?id=20822378

temac · on Aug 28, 2019

Of course there is no UB in the source code. That makes the problem worse, not less bad... Non-nullness is inferred from 'node->child = NULL;' which would be UB would node be NULL. At this point the compiler thinks it is superfluous to check whether 'node' is null and start to propagates that, except it messes up.

It seems that it messes up extremely badly because even -fno-delete-null-pointer-checks does not seem to fix the issue :/

saagarjha · on Aug 28, 2019

> Non-nullness is inferred from 'node->child = NULL;' which would be UB would node be NULL.

There’s also a dereference inside of walk, except it’s protected by a NULL check.

temac · on Aug 28, 2019

Yes I see that, I was not describing what the compiler should observe to work properly, I was describing the erroneous reasoning it was making prior to the fix. I should have written: non-nullness of 'node'.

And TBH I'm starting to be more concerned by the fact that fno-omit-null-pointer-checks did not suppress that codegen bug than by its existence in the first place...

saagarjha · on Aug 28, 2019

> I was describing the erroneous reasoning it was making prior to the fix. I should have written: non-nullness of 'node'.

I think you're misinterpreting the error in the compiler's reasoning: unless I am misunderstanding you, it doesn't seem to match the comment I linked earlier. The reason -fno-delete-null-pointer-checks did not help is because it only kicks in when you use a NULL pointer in a way that is undefined, but that's not what's occurring here.

jcranmer · on Aug 28, 2019

> And TBH I'm starting to be more concerned by the fact that fno-omit-null-pointer-checks did not suppress that codegen bug than by its existence in the first place...

-fno-omit-null-pointer-checks doesn't suppress it because it has nothing to do with it. -fno-omit-null-pointer-checks means "don't infer from the presence of *x that x is non-NULL." In this case, the non-nullity of x was inferred directly from the check "if (x != NULL)".

jcranmer · on Aug 28, 2019

> The origin is absolutely the usual exploitation of UB in the sense that the compiler supposes that code paths that are past one can be elided (because programmers never write bugs, that kind of interesting reasoning...)

This is demonstrably false. The logic follows from:

   if (val != NULL) {
     ... = val; /* val is not null because the if statement says so */
   }

This is not coming from anything that smells of "you said *val, so val is not NULL." It is merely a straightforward bug where the compiler attached some deduced facts that were context-sensitive, and then accidentally promoted them to context-insensitive facts without checking that it was true in all contexts.