This is a very good post; too many people (even myself, sometimes) forget that volatile doesn't mean that the statement containing it cannot be reordered.
[EDIT: the one thing he missed, which I would have liked to know, is about using volatile with int (or sig_atomic_t) as an "eventually consistent" value, for example one global `sig_atomic_t end_flag = 0;`, a single writer (a SIGINT handler to set it to 1), and many threads with `while (end_flag == 0) { ... }` loops.
I've been using this pattern for a while with no obvious problems - access to `end_flag` can be rearranged by the compiler, barriers are irrelevant, the value can be corrupted by a race on every read and it won't matter - the thread will get the eventual value of end_flag on the next loop and end.]
> one global `sig_atomic_t end_flag = 0;`, a single writer (a SIGINT handler to set it to 1), and many threads with `while (end_flag == 0) { ... }` loops.
While unlikely to cause problems, this is a data race (a set of at least two concurrent accesses, of which at least one is not an atomic access and at least one is a write) and therefore constitutes undefined behavior.
sig_atomic_t is only safe to use from one thread, where concurrency is given by a signal handler.
I suspect sig_atomic_t does work fine when we're talking about POSIX signals, but OP was probably thinking more from an embedded programming and hardware interrupt handlers, which don't conform to POSIX signal semantics.
> I suspect sig_atomic_t does work fine when we're talking about POSIX signals, but OP was probably thinking more from an embedded programming and hardware interrupt handlers, which don't conform to POSIX signal semantics.
It's not the sig_atomic_t that I think is wrong (could be plain int), it's the "Is it safe to have one writer to a zero-initialised volatile value, and many readers checking for non-zero of that value?"
Now I wouldn't use this and expect correctness in the value, but even when the value that is read is corrupted because that single write did not finish (it's zero, one or something else), it will be non-zero eventually, and so the thread will end.
Technically, using volatile between threads is a data race and therefore UB [1]; the guarantees made around sig_atomic_t only apply between a thread and a signal handler on the same thread.
Though, I'd argue that the no-optimization guarantee of volatile actually does justify reasoning of the form "it's not undefined behavior because the hardware guarantees it", which is a mistake anywhere else in C. On essentially all architectures, loads and stores of volatile integers act the same way as loads and stores of atomics using memory_order_relaxed (or stronger, depending on the architecture). So it may be legal to rely on volatile being atomic, as long as you don't expect the code to be compiled on some hypothetical architecture that doesn't have this feature.
At the hardware level, there is no distinction between a regular load, a volatile load, and a relaxed atomic
load (assuming small sizes and optimal alignments). But the compiler can still can do things that break your code or that miss optimizations when given incorrect ordering annotations.
It seems to me that volatile ends up falling almost between acquire and relaxed ordering, in terms of the behavior of most CPUs and compilers, for small aligned values: it doesn’t synchronize any other operation but does prevent folding of consecutive operations.
[EDIT: the one thing he missed, which I would have liked to know, is about using volatile with int (or sig_atomic_t) as an "eventually consistent" value, for example one global `sig_atomic_t end_flag = 0;`, a single writer (a SIGINT handler to set it to 1), and many threads with `while (end_flag == 0) { ... }` loops.
I've been using this pattern for a while with no obvious problems - access to `end_flag` can be rearranged by the compiler, barriers are irrelevant, the value can be corrupted by a race on every read and it won't matter - the thread will get the eventual value of end_flag on the next loop and end.]