(2010) The post is still accurate, but in 2011 C and C++ added atomics, which ar...

quelsolaar · on Oct 6, 2023

C11 did add _Atomic, BUT, they are not more portable than using volatile.

In C11 any type of any size can have an atomic qualifyer. That means you can have a 50 byte struct that is an atomic. No hardware has a 50 byte atomic instruction so that is not implementable using atomics. The standard gets around this by letting an implementation have a hidden mutex to guarantee that the operations will be atomic.

The problem with this is Windows. Windows lets an application load dynamicaly and shared libraries (DLL). This breaks the C11 Atomic model. Let me illustrate using an example:

Application A creates an atomic data structure, and the implementation creates a mutex for it. Application B does the same thing. Application A wants to share this data structute with dll X. It then has to share its mutex with the DLL sop that the DLL and application uses the same syncronization primitive. Now Application B wants to do the same thing, problem is DLL X cant use Application Bs Mutex, becaus it is required to use Apllication As mutex.

C11s Atomics will never be implemented on Windows because they cant be! Besides, all major compilers do support intrinsic atomics using volatile, that are nearly identical, (and in some ways better understood) so thats what I recomend using. Linus has indicated that he thinks the C11 concurrent memory model is broken so the kernel will continiue to use volatile and intrincics.

skullt · on Oct 6, 2023

> C11s Atomics will never be implemented on Windows because they cant be!

They are implementing them though. See https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual...

manwe150 · on Oct 6, 2023

On Linux, applications are required to load the libatomic shared library to correctly implement the full standard (on some older platforms like ARM, this is true even for small values since the hardware didn’t have an exchange instruction). There are a lot of operations that atomic int can do that volatile can’t do or will do incorrectly (such as seq-cst ordering or exchange). Why is this impossible for Windows? That sounds like a compiler implementation flaw, not an OS level impossibility.

quelsolaar · on Oct 6, 2023

Because Windows lets a process link to a library that has already been initialized by another process, so the process cant share its mutex during initialization, becaus the initialization has allready happend.

volotile gives you some decired propperties when multi-threadding, it is observable and therfor order rependent, but it does not have release/aquire semantics and it is not required to syncronize changes to other processors, so you need to use atomic intrincics in conjunction with volatile types. Volatile alone is NOT enough to be thread safe.

manwe150 · on Oct 6, 2023

How do you link that? I thought I knew the API pretty well, but that is a new one for me.

volatile is only ordered with other volatile calls and is otherwise UB when combined with non-atomics. Whereas an atomic is well-defined for ordering on other operations too. There are even some operations (eg seq-cst on a set of memory locations) which are known to be incorrectly executed in certain scenarios if modeled with volatile+fence, and require the use of atomics.

quelsolaar · on Oct 6, 2023

Volatile is "observable" and all observable behaviour is order dependent. (so for instance a volatile and printef can not be reorderd, because both are obeservable). Howqever this is ONLY in a single threaded context. This is why you need a atomic intrincic operation to operate on a volatile for it to be thread safe.

Even on hardware where loads and srtores are atomic (x86), you still need the atomic ops to manage ordering.

manwe150 · on Oct 7, 2023

What is “observable”? Only atomic seq-cst is order preserving, and even then only if there is no data races, and only if the compiler thinks it could even be observed by another thread. Otherwise, the compiler (and CPU) can and will choose to reorder operations. A printf call could even be reordered if the compiler could observe that it does not contain a volatile access. The volatile qualifier forces the operation to occur even if the compiler thinks the result would not be observable. But unless you work a lot with signals or memory mapped device registers, how is volatile even relevant, especially when the atomic ops are required anyways?

gpderetta · on Oct 6, 2023

Volatile structs are also not guaranteed atomic, so you do not lose any portability by changing questionable volatile atomic word sized variables to _Atomic (except portability to pre-C11 compilers of course, but then you can #define _Atomic volatile and pray for the best).

zombot · on Oct 6, 2023

OMG, every time I learn something new about Windows, it turns out to be even worse than I already thought it is.

quelsolaar · on Oct 6, 2023

Actiually,I this case Windows is more capable than other operating systems. Sharoing a loaded library between multiple process has a lot of uses.

manwe150 · on Oct 7, 2023

Sharing the memory of a loaded library is not a problem though. Every major operating system has done that for decades. That is just a consequence of copy-on-write pages though and doesn’t affect process isolation. Unless you meant something different than that?

There is also fork on posix systems, which is incompatible with using atomics in the child process for basically this reason though of accidentally partially sharing a loaded library between two processes. Most libc documentation will state that only async-safe calls are permitted after fork until exec for this reason.

quelsolaar · on Oct 7, 2023

the difference is that in windows if two applications load the same DLLs they don not just share code, they also share state and data. If a DLL has a global variable, it can be accessed by both applications.

This means that you can use DLLs as a mechanism to communicate between multiple applications.

gpderetta · on Oct 9, 2023

Really? I can't believe that could possibly be true out of the box [1]. It would be a massive violation of process separation: buggy programs would be able to take down other processes, which is not something that really happens after WinME. I have 0 knowledge about Win32, do you have a pointer to some docs describing this behavior?

[1] of course even on unix you can mmap state on demand if you want to share between processes, but it is absolutely not the default.

manwe150 · on Oct 9, 2023

Yeah, now that quelsolaar clarified, I am fairly certain that claim is not true, for exactly the reasons you describe. Of course, there is also the practical example of the mingw compiler, which does implement C11 for Windows, as a counter-example to their claim that it cannot be done.

gpderetta · on Oct 9, 2023

So, there is probably a kernel of truth, as far as I know the DLL model in Windows is equivalent of RTLD_LOCAL, so global variables are actually instantiated per DLL (but of course not shared cross process), which for example makes allocation behave differently. So a spinlock pool between the main program and a DLL wouldn't be shared, making cross-DLL emulated atomics problematic. But I guess there are ways around that or simply the expectation is not to share non-address-free atomics across DLL boundaries.

manwe150 · on Oct 12, 2023

In my experience, being similar to RTLD_LOCAL avoids a whole slew of sharing/unique accidents compared to the pile of hacks that is ELF. It is sometimes both the hardest and easiest platform to work with since it is the conceptually most consistent but also therefore the most primitive linker. But that is just not an issue, as the compiler must work anyways to ensure atomics work correctly per the platform ABI.

Indeed the problem is otherwise not restricted just to memory sharing: even the particular CPU instructions chosen can mean one compiler is incompatible with the output of a different compiler when it comes to atomics even when locks are not involved (the specifics of which barriers are used and where often mean there are multiple valid, but mutually incompatible ways, to emit atomic instructions)

lelanthran · on Oct 6, 2023

> The post is still accurate, but in 2011 C and C++ added atomics, which are a more portable alternative to uses of volatile for atomicity.

Atomics and volatile solve different problems, though. Atomics ensure a read or write completes uninterrupted. Volatile ensures that a read or write is never optimised away.

I think C11 atomics can be optimised away (for example, reading a value twice in a row might result in only a single actual read).

Happy to be corrected, though.

comex · on Oct 6, 2023

> Atomics and volatile solve different problems, though.

Yep, that's why atomics are only an alternative to uses of volatile _for atomicity_. For the original use case of accessing hardware registers, volatile is still the correct choice.

It is indeed possible for C11 atomics to be optimized, although interestingly, the three major compilers do very little such optimization. This paper [1] lists some optimizations that are implemented in LLVM and some that aren't; it's from 2015 but from some quick testing it seems like not much has changed since.

[1] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n44...

binarycoffee · on Oct 6, 2023

I have always wondered about that.

If optimizing repeated atomic loads is indeed allowed, waiting for a signal by spinning on an atomic load could loop forever. Yet I have the feeling most people consider such code to be valid. Are they wrong?

gpderetta · on Oct 6, 2023

Some optimizations are allowed within the rules of the memory model. For example, for an atomic x:

   x=1
   x=2

Can be changed to just x=2. Still forward progress and eventual visibility must be guaranteed.

manwe150 · on Oct 6, 2023

Acquire loads (and stronger) are required not to loop forever, if they could be observable from another thread. It is mentioned in the linked paper.

SuchAnonMuchWow · on Oct 6, 2023

While its true that atomic can solve the issue of atomic operations (increment, compare and swap, ...) that volatile doesn't try to solve, it is also required to solve the issue that volatile tried but failed to solve correctly: you have no ordering guaranteed between volatile and non-volatile memory accesses (see problem no.5 of the article).

In that way, atomic complete volatile instead of being orthogonal o it, because it provides the ordering semantic missing in volatile. And it doesn't replace it completely, because as you said, atomic accesses can still be optimized away.

So in most use-cases of volatile, you actually want to declare your variables atomic+volatile along with the correct memory_order on your atomic operations.