When you support a compiler you get a massive stream of users complaining the at at the compiler has a bug, though in most cases it’s the user’s bug.
Of course there are also good reports where the user actually has found a bug in the compiler.
Though of course most are weird corner case or bugs in new features, there are a surprising number that make you think “wow, how could any program be successfully compiled which this bug in the tree?”
I found a bug in the .NET compiler back when I worked at Microsoft, and at first no one wanted to believe me :)
It manifested when you had a single static member of a specific generic type in a class. The program crashed complaining about invalid CLR instructions. If you added a second static member of the same type to the class, or changed the type of the generic parameter, it didn't reproduce.
Turns out it was related to how the compiler used AVX intrinsics on CPUs that supported those instructions.
Pretty fun but took some convincing for people to believe it was a compiler bug.
I don't know why people think compilers are so infallible, or are more likely to be better written than your application. If people write bugs in applications guess what the same people write similar bugs in compilers too.
Probably because everyone still has the numerous starting experiences in the back of their head where they thought their code must be correct and it couldn't possibly your fault only to realize the next day that it was absolutely your fault.
It definitely took me a while to realize that VS was just always executing the default statement of a switch (instead of an applying case) if the default was at the top of the switch instead of the bottom, and only when the switch was being executed at compile time in a constexpr context.
> I don't know why people think compilers are so infallible, or are more likely to be better written than your application.
Since they are rare and written by relatively small and knowledgeable people, most of the users (incl. me) think that they're meticulously tested and developed with utmost care.
I just remembered that one of my applications were hanging in a hot code path if I didn't write a small debug directive in the middle of it. I always thought the problem was with me but, it can be anywhere between me and the silicon. When you're in the middle of the development heat, you always blame your code or the libraries you use. Compiler and the lower levels are not put in the suspect list until bug becomes fairly stubborn to persist.
I maintain compilers for a living for a safety-critical embedded OS. I've found dozens of bugs in the compiler, dozens of bugs in the OS kernel, and dozens of bugs in the third-party validation test suites we use to qualify the compiler.
I also live in a log cabin in the back woods and can go off-grid. I've seen shit you people would not believe. It's just a matter of time now. Dominoes.
There was code in IE 5 that used the original SSE instructions if the chip had them, and regular code as a fallback
We found a stepping of a 486 chip where this code crashed about 25% of the time
Since it was only that stepping, and we already had a fallback path, we just skipped the optimization for that chip version and didn’t investigate exactly what was broken though
Having a lot of customers with various CPU versions helped to track this down pretty quickly
Probably not 486 - this was 1999-2000 timeframe, so some sort of Pentium thing.
It was definitely the new instructions though, so whatever Intel chip had the new features had this 1 stepping that had this bug (and it wasn’t the first version of the chip either)
Additionally when you work on something that is used by a lot of people or machines, you get to see that memory and disk problems actually happen a lot in terms of raw numbers. Multiply a tiny, miniscule percentage by a lot of runs and they surface.
It is tough to not get overconfident in this diagnosis. If your code happens to see hardware problems on a routine basis, and a real bug surfaces, it is very challenging to not dismiss the latter for the former. It likely took impressive work to diagnose this as far as they have.
I remember an old game developer mentioning that they put a test for flaky cache memory in their installer and hacked up an error if it failed. Which significantly cut down the number of support calls.
I misremembered, not support calls but bogus crash reports from flaky cache. Bogus crash reports like this are terrible because they aren't caused by bugs. And it's impossible to prove a negative.
Reminds me about one time when I've spent a week debugging random, but quite consistent kernel crashes, which turned out to be a gcc miscompiling kernel driver code to decrement stack pointer before ceasing to use some values in that stack area. There was one or two instructions, where if a re-entrant irq happened, would reuse that stack part and corrupt data there.
> To reproduce, build the attached program with "gcc -pthread test.c" and run it on a 5.2 or later kernel compiled with GCC 9 (if the kernel is compiled with GCC 8, it does not reproduce).
I wonder if this is a compiler bug or a new optimization that broke the code.
From the link, it seems like GCC 8 does not cache a read from a variable, and has more memory access to read it, while GCC 9 reads that variable from a register every time. (Maybe from a corrupted register?)
From what I can tell, the issue is that GCC9 stores the result of a pointer dereference in a register to reuse on each loop operation but the loop operation needs to dereference the pointer each time to work correctly.
I believe the aforementioned pointer caching is in the kernel--it's the pointer to the FP register state which is cached across preemption points in the kernel.
Probably a mix of both; when developing a kernel, you constantly fight against the compiler trying to be smart and optiziming things out of your code, inlining things, caching data, etc.
In this case, it might be necessary to do a volatile read in case that flag has changed, forcing the compiler to reload it.
"Debugging an evil Go runtime bug" - https://news.ycombinator.com/item?id=15845118
https://github.com/golang/go/issues/20427
https://bugs.gentoo.org/637152
https://lkml.org/lkml/2017/11/10/188