Hacker News new | past | comments | ask | show | jobs | submit login

This is great. Reminds me of a crash I saw early in my career. It was a null-pointer exception, except it occurred right after confirming the address was non-null. This was on a single core with a non-preemptible kernel. So the processor just took the wrong branch! There was simply no other explanation.



Are you sure the compiler didn't say "since having a null pointer gives undefined behavior, we can optimize out the part that confirms the address is non-null"?


This is likely your answer. C++ story. I worked at a large company that had a "no exceptions" policy and a custom operator new. If a new expression failed it would return nullptr instead of throwing. So lots of people wrote "checking" code to make sure the result wasn't nullptr, except that the compiler would always just elide that code since the standard mandates that the result cannot be nullptr. Many weird crashes ensued.


There are non-throwing operator new overloads that can return nullptr, but I'm not sure if those are a relatively recent development. Did the non-throwing operator new overloads not exist at the time?


Hard to say. Most of the uses probably predated the custom operator new and so nobody thought about it. Not to mention the places you cannot sneak into to switch to std::nothrow.


Ah, that's fair. Didn't think of code that couldn't be changed.


`new (nothrow)` was in C++98 and in ARM C++ before that.


Ah, I didn't know that. Thanks!


The idea that a compiler should just silently omit anything that it’s pretty sure won’t be needed is one of the most bafflingly daft decisions I’ve ever encountered.

If a supplier sent you out-of-spec parts because they didn’t think your spec was actually important, you’d call them fraudulent, not clever.


> The idea that a compiler should just silently omit anything that it’s pretty sure won’t be needed is one of the most bafflingly daft decisions I’ve ever encountered.

It's not merely "pretty sure" - the language is specifically defined this way. C++ in particular requires as a minimum standard from practitioners a perfect knowledge of the vast, complex language standard. Anything less and you'll write a program which is ill-formed or has undefined behaviour, ie your program is nonsense.


I think the key word here is "silently"; It would be one thing if the compiler informed the developer it was going to skip a statement (and said "if you really want this statement kept in, add a preprocessor directive here")


Redundant NULL checks happen all over the place. So the result of your revised requirement would be a huge pile of useless diagnostics. Whereupon, as with similar diagnostics C++ programmers demand a way to switch them off because they're annoying, then they're back to being annoyed that the compiler didn't do what they expected.


Yeah, that's the problem.


Are there any good tools to see how the compiler is rewriting your code? Almost a compiler coupled with a decompiler to show me the diff of what I wrote and what's happening?


(You almost certainly know this, but I presume the parent doesn't.)

GCC can be passed `-fno-delete-null-pointer-checks` precisely to prevent this. Linux uses it, see e.g. https://lkml.org/lkml/2018/4/4/601 where it's discussed.


Wait, the compiler can do that? I thought only dereferencing a null pointer gives undefined behavior, just checking if a pointer is 0 or not shold be valid?

Yeah based on https://joelaro.wordpress.com/2015/09/30/gcc-optimization-fd... the assumption is that if the pointer was previously dereferenced, it can be assumed to be non-null, and hence all subsequent null-checks can be elided.

Based on https://news.ycombinator.com/item?id=17360772, the case in which this actually makes a difference is an address offset like

   int *n = &param->length
The compiler knows the structure so it can just inline the offset, but the optimizer considers this as being non-null.


You can check if the address is 0, but the value of the nullptr is implementation definied behaviour. You can have a system where the null pointer is 4.

And address 0 isn't too special on a machine level either. Embedded systems have valid data to read and write there.


Yes. This was upwards of 20 years ago on a DEC Alpha, and we were inspecting the compiled build in gdb using the core file from the crash.


What hardware/platform was this? I worked on AIX on POWER a long time ago and it had to map the zero page read-only just to support speculative execution of dereferencing the NULL pointer, if I remember right.

If you were on a platform that did this wrong, speculative execution could have been dereferencing the NULL pointer.


It was a DEC Alpha chip, over two decades ago. (Or MIPS?) Pretty sure there was no speculative execution. In any case, the senior engineer I was working with would have known that—getting to "it just did the wrong thing" was our final and least satisfying conclusion!


The Alpha does quite aggressive instruction reordering, you need to use more memory barriers than x86 in parallel code for example.


The DEC Alpha had speculative execution.

I found lots of bugs porting code to it. It was my first 64-bit platform.


I found a bug in the compiler on it! I was writing some fractal generating code, and found that something I did caused the compiler to output the error message: “Dave doesn’t think this should compile. Please send an email to dave<someone-or-other>@dec.com”. So I did. They replied that they found the problem and had a fix for it.


Got to read the assembly to really know what happened.

E.g. if the architecture has pointers with non-address bits (modes or segments or whatever) and those bits were set yet the rest of the address was 'null', and the check was for 'all bits zero' then you could conceivably get that situation.


Interesting, how did you fix it? Negate the comparison with an appropriate comment?


There was no fix! It was a one-off hardware error.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: