Thanks for going through all this so meticulously. I actually didn't know this was possible (although I usually disable optimizations, so that might be why.) Just to be clear, using volatile on the magic number will ensure this is avoided?
I recommend reading the article I linked. They thought they found a solution with volatile and it turns out didn’t actually work. Malloc/free are 100% special and a sufficiently smart compiler can remove this stuff (sometimes automatically, sometimes if you turn on LTO).
memset_s is the only function that would be defined to work precisely because this was needed for crypto so it is solvable but just take it on faith that if it’s not explicitly called out as forcing the compiler to not do dead store elimination, it will happen.
Thanks for the recommendation, I had skipped it. It's worse than I thought. Wow, I need to reprogram my brain actually. Maybe it's time for me to actually read the standard and rethink these things.
> Just to be clear, using volatile on the magic number will ensure this is avoided?
I think that's your best bet but volatile does bring its own set of problems with it (how it works differs from compiler to compiler). That said it should probably work fine, this is a pretty simple use of volatile. You might check the documentation for your particular compiler though if you have one in mind, it should tell you what it does and if it does anything undesirable (or if there's a better way to do this). Also the volatile cast like I did may be a better approach than actually making the member itself volatile.
FWIW, the whole idea is that whether this happens or not has no impact on your program, so in theory you shouldn't ever notice this happens. Detecting use-after-free like this is not really standards compliant so that's a big reason why it's problematic to implement.
Also, if you're not using the standard allcator then most of this logic doesn't apply because the compiler won't know your special allocator has the semantics of 'free()'. There are `malloc` attributes in gcc that might trigger similar optimization behavior, but you'd have to be using them, and even then I'm not really sure as I haven't looked into what all they do.
Best bet is to use the APIs that standards/compilers guarantee things about. The compiler is totally free to optimize volatile variables under very specific situations. Volatile has a very specific definition, but it does not mean “compiler is not allowed to optimize this”.
Trying to structure code to trick the compiler is a bad idea. The compiler authors know the standard better than you and eventually the compiler will exploit your misunderstanding.
I agree, but FWIW I'm not claiming that volatile means "the compiler can't optimize this". This program would still function anyway even if the volatile is optimized out, so this is much more of a hint rather than a "this has to happen". A simple volatile store like this is also very unlikely to be optimized out, I don't know of any compiler that would attempt to do it and frankly it would break lots of stuff if it did (just because it can't be optimized out doesn't mean it can't cause other problems though). But when you get down to it trying to catch these use-after-free errors is never going to be guaranteed to work since as we've established use-after-free already breaks the standard itself. Still, using one of the various 'secure zero' APIs if you have one is definitely better, though the logic would need to be changed slightly.
I'd really like to see the example if that's the case, that would be very surprising to me and frankly sounds like a bug if it's really just a simple usage. The gcc documentation[0] suggests it will always emit a load and store, even when the result is completely ignored and is effectively dead code. I (and others) have interpreted this to mean they will never optimize out the actual load or store regardless of context (though reordering and such is still on the table in some cases, obviously, but that doesn't matter for this usage).
As context, the Linux Kernel uses volatile to ensure loads and stores happen, that's ultimately how READ_ONCE and WRITE_ONCE work[1]. If that's actually broken in such a simple case I think they'd like to know xD
Edit: To be clear, I looked for the example you mentioned but couldn't find it. I'm somewhat wondering if you were thinking of the example I posted, since I used volatile to get gcc to not optimize the store out :P
It’s literally in the top level link I supplied [1]. You may trick some compilers today but there’s no guarantee that tomorrow’s compilers won’t get smart and leave you scratching your head about what went wrong. Memory allocation and deallocation is special in the standard. I agree it’s a bit weird but there are reasons for it (this is a form of dead store elimination that isn’t the same as normal dead store elimination which the compiler can’t optimize for volatiles because of what volatile means semantically). Your example with the kernel doesn’t apply because there’s no free happening there.
I’m genuinely amazed at the response. There’s literally an API defined that has the contract you want and your response is “yeah, but I want to write it a totally other way the standard doesn’t allow”. Just use memset_s. It’s a compiler builtin so the generated code is as efficient (more so) as compared with a volatile version except actually safe. Volatile has a totally different purpose and isn’t suitable to try to write a value before calling free.
I’ll leave writing a godbolt example of writing to a volatile right before a free in the same compilation unit at O3 for you to try out.
First I will point out, as I said memset_s is a good solution, I have no problem with that and would suggest it's use if it's possible. My complaint is simply the suggestion that volatile doesn't work here, it does.
As far as that article goes, the example for `secure_memzero` works and you will not find any compiler that will 'optimize that out', it would be a bug. And as I linked, the gcc documentation says as much. With that, memory allocation is not as special as you're making it out to be, normal memory can be volatile in perfectly valid situations (even ones mentioned in the standard), and just because it's related to a free() does not mean the compiler is now allowed to remove a volatile dead store - and even if you think it does, gcc will not do that.
Here's an example of such a case[0]. A signal handler is able to view the object being set right before the free() call, and a signal could trigger at that point, but the compiler still optimizes it out (which is correct). Using volatile on the variable to ensure all loads and stores actually happen (and are visible to the signal handler) is the suggested way, and if you do that then the code does set the value before the free().
As for your suggestion of writing to a volatile right before a free(), I'm not sure if you tried but it works just fine as expected, look[1]. I am perfectly confident in saying you will never find an example where the volatile store doesn't happen. With that, if it was willing to make such an optimization in the first place, don't you think my original example that used it to avoid dead store elimination and memory allocation elimination wouldn't have worked in the first place? ;)
I have at times (ab)used volatile to aid in debugging sessions, something like
volatile bool doCheck = false;
if (doCheck)
{
// code I want to enable at some point during debugging
}
The idea is that I attach a debugger, and then only at a certain point enable doCheck.
I was baffled to learn that MSVC will happily constant-fold the false into the if, as long as the variable is function-local. The variable still exists and I can change it in the debugger, but it doesn't actually impact control flow as intended. The "solution" is to move it to e.g. global scope (this is a debugging hack, remember).
Not an exact match for what you asked, but I think a good reminder that optimizers work in mysterious ways, and sprinkling in volatile may confuse the programmer more than the optimizer...
You can use `explicit_bzero()` to bypass DCE (dead code elimination). Otherwise, simply initializing your memory before using is enough to trigger magic failures when you use-after-free. C programs barely function if they do not initialize memory. Context, I work on Varnish which the OP referenced for this.