In 2013, LWN had an article on Optimization-unsafe code [1]. Essentially, after a pointer dereference, the compiler is allowed to optimize out any NULL-pointer checks, as dereferencing a NULL pointer is undefined behavior (and the compiler may assume that never happens at runtime, thus the optimization).
This article, in a nutshell, points out that memcpy and friends have a similar property: Although code may ask memcpy to copy 0 bytes (which will not crash in practice if argument pointers are invalid or NULL), behavior is still undefined if the argument pointers source/destination are invalid.
I think the article is missing an introduction, as I had no idea initially what kind of "difference" between the pointer/length pairs the author was asking about.
Ugh.. I'm generally in favor of the recent compiler optimizations dealing with assuming undefined behavior isn't triggered (although it has broken a lot of code that had made poor attempts at overflow checking, mine included) I think doing it for the mem* functions is a step too far.
I think it really was a defect in the C standard to not specify behavior of calls like memcpy(NULL, NULL, 0) It has obvious no-op semantics, and every implementation I've ever seen would accept it just fine. I bet there is a lot of code out there that assumes that memcpy() and friends work in this "traditional" way.
I agree. In this case, writing memcpy by hand would work fine, but calling the standard memcpy is undefined. That's just silly. Undefined behavior should be reserved for cases where making it defined is not easy.
I would make a less-strict statement that "Undefined behavior should be reserved for cases where there might be some advantage for an implementation to do it a different way."
For memcpy it's never safe for it to access the byte at "src+len" (it could be past the end of mapped memory) nor is it safe to write to "dest+len" So if len==0 it should follow that it won't access either src[0] or dest[0].
The spec basically allows len==0 to be a special case for this rule, although it's not clear what implementation would ever be able to take advantage of that.
Even better, your custom memcpy might sometimes result in working code and sometimes be replaced with a call to memcpy[1] and then break. When this happens will depend on the compiler and optimization level and even if you manage to write one that isn't replaced today the next version of GCC might be smart enough to figure it out.
Optimizing to assembly that calls the actual external memcpy implementation should be fine. If the custom memcpy works with NULL, without triggering undefined behavior, then any further optimization which assumes memcpy's NULL semantics would be clearly incorrect (which is not to say some compiler won't do it, but it would be a bug). The bug report you linked is about naming a function "memcpy", which is different - and IMO GCC's behavior was indeed incorrect until the change to disable the problematic functionality with -ffreestanding.
The mentality of compiler authors writing these kinds of optimizations needs examination. Specifically, inferring value constraints based on value use, and eliminating branches on that basis, is suspect for a very simple reason: if you accept as an assumption that value constraints can be inferred from code, using undefined behaviour to eliminate a branch needs to be balanced by the fact that an alternate branch exists, and presumably the user wrote it for a reason. You can't have your cake and eat it.
Technically correct is not a useful standard. Working code is. If compilers make code like this fail, it should fail very loudly, not quietly; silently eliminating null checks is not a virtue unless it is absolutely known that the null pointer use would definitely blow up otherwise. Which is not always guaranteed in practice, as the memset experiment here shows.
If compiler authors really want to pursue this optimization, elimination of dead code by means of detection of undefined code should result in a compilation error, because the code intent is self-contradictory.
It's not that simple, because of inlining and macros.
Those optimizations can be used to quickly throw out unnecessary code like null-pointer checks inside inlined functions, so they are valuable and good to have.
So it's a matter of how much energy you spend on diagnostics, which ends up a rather heuristic field and perhaps we'd just be better off focusing on better static analysis tools that are separate from or can otherwise be decoupled from compilers.
Are we better served by fast and correct code, or faster and wrong?
The gain from application of an optimization needs to be compared with the time and cost of bugs introduced by the same optimization. And dead code elimination isn't necessarily a huge win - if you're fairly sure code is dead, you can make it cheap at the cost of being more expensive should it actually be alive (see my other comment).
Unnecessary code is one thing, exploiting the presence of undefined behavior to compile something that the programmer likely didn't want is another. It doesn't matter much if it's fast if it's wrong. A good compiler shouldn't try to read the mind of the programmer, it should complain loudly and stop. It should be free to do so, given tha leeway it has with undefined behavior.
And of course, the standards committee should get off their asses and define some behaviors.
Telling apart which dead code the programmer thought was meaningful and what is the result of a macro or inlined function is difficult. Using functions and macros very often causes dead branches to be added, and optimizing out dead branches is a huge performance win. The fact that some already-incorrect code incorrectly behaves after these optimizations are applied is not a good reason to stop doing an entire class of standards-compatible optimizations.
There's deducing the range of a variable due to constant expression evaluation and control flow, and then there's deducing its range because of undefined behaviour.
I believe this analysis is possible; taint the control flow analysis with a flag. Macro expansion and inlined functions are also not intractable; debug information requires tracking token location information all the way through to generated code, so this information can be used to apply a heuristic.
Dead branches don't need to cost that much, BTW. Put the code for disfavoured basic blocks out of line with other code (so it doesn't burn up code cache), and use hinted conditional jumps to enter them (e.g. on x86, forward conditional jumps are predicted not to be taken).
That's assuming that apparently dead branches being removed from macros and inlined functions will not cause issues with incorrect code relying on undefined behavior.
What about by default throwing an error or warning only in cases where it is "obvious"? (i.e. only in cases where the check was explicitly stated, so no macros / function boundaries / etc)
Ditto, if you have an explicit block (i.e. no macros / etc) that always causes undefined behavior, warn or throw an error by default.
These "gotcha" optimizations are infuriating. Whether or not they're legal under the C standard, real-world _programmer_ mental computational models don't include a zero-length memset magically deleting NULL checks before and after the memset call. These optimizations are compiler bugs and need to be fixed.
Not everything the standard permits is in fact a good idea.
Why not both? Compilers have much faster turnaround times than language standards, and a compiler that declines to view memset as a NULL assertion is still perfectly conforming.
It should be possible to use a SMT solver to automatically check for code for which the -O0 version has observably different behavior than the -O3 version for some inputs to the program.
Though I'm not sure current SMT solvers are up to the task. (Related: I was using Sugar for something a while back, only to find it was spitting out weird error messages. Turns out it encodes integers as <size of domain> boolean variables, which doesn't work particularly well for large domains - it was hitting a 4 (2?) GB limit on the size of the temporary SAT input file and choking)
This just makes the case for a safe subset of C, like Cyclone, Clight, or the recent proposal [which I can't find now] for changing the standard so that what's written is closer to what's executed.
Maybe you are thinking of “Friendly C”, which is more a pledge compiler authors could take not to be too creative in their quest for improved benchmark results.
According to the article, OpenBSD changes how GCC's optimization options get enabled with global `-Ox` options, so OpenBSD has a “Friendly C” compiler of sort.
This article, in a nutshell, points out that memcpy and friends have a similar property: Although code may ask memcpy to copy 0 bytes (which will not crash in practice if argument pointers are invalid or NULL), behavior is still undefined if the argument pointers source/destination are invalid.
I think the article is missing an introduction, as I had no idea initially what kind of "difference" between the pointer/length pairs the author was asking about.
[1] https://lwn.net/Articles/575563/