It is very annoying that 'volatile' means different things in different languages. Inexcusable that Microsoft should unilaterally change the meaning of 'volatile' in their C++ compiler, adding yet another #ifdef into everyone's code.
Both C and C++ defines volatile as something like "Access to volatile objects are evaluated strictly according to the rules of the abstract machine." (exact wording from some old draft of C++0x I have here) with C explicitly noting that "What constitutes an access to an object that has volatile-qualified type is implementation-defined." (ISO 9899 6.7.3.6). So there is not many to say about what volatile means without knowing exact implementation (and implementation should document what volatile exactly means).
I don't see what Microsoft could do to unilaterally change meaning of something that is almost completely implementation-defined.
I'm not so sure that your reading of the standard is correct. From n1256.pdf, the essentially unchanged and freely available draft of the C99 standard:
(5.1.2.3.2): Accessing a volatile object, ..., or calling a function that does any of those operations are all side effects...
(5.1.2.3.3): In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
It does not look like this allows an implementation complete freedom in deciding what constitutes an "access". In fact, I'm pretty certain that 'volatile int i; i;' is required to access i exactly once. (There are plenty of optimizer bugs in this area, but these are bugs.)
(Note that the above has nothing to do with threads and everything with memory-mapped devices. Perhaps you were thinking of threads? These are not part of the C standard, though.)
(6.7.3.6): What constitutes an access to an object that has volatile-qualified type is implementation-defined.
That seems to me like giving 'complete freedom in deciding what constitutes an "access"'. Or almost.
Actually, chapter 4.10 of GCC 4.4 manual documents how is this defined by GCC and explicitly states, that discarding result of read of volatile object does not always cause access to such object.
You are arguing against a straw-man. I wasn't talking about the standard. The behaviour is, as you say, implementation defined. So, Microsoft are perfectly free to implement volatile access from multiple threads as re-formatting the hard-drive or whatever.
What I did say is that Microsoft have chosen to implement volatile access in a way that is incompatible with other compilers. Their choice to put the barriers in for you is certainly well-intentioned, but really just complicates matters.
Reason for this is that static actually does exactly same thing in both cases: defines global variable without externally accessible name and makes it accessible in enclosing scope. (exact standarteese being "internal linkange with static storage duration")
It's actually quite remarkable the number of places a single keyword can be used. We have
* Static functions in C
* Static variables in C
* Static members in C++
* Static instance variables in Java
* Static methods in Java
The strange combination of C++'s namespaces and static begets static members, which almost make sense in context. But then Java stole the syntax and not the rest of the language (thank god), leading to its strange, almost contradictory usage.
Java defines it in a very useful way. It's identical to an AtomicReference, but without any test&set semantics. Anything other than that seems pretty dangerous.
Personally I find the issue with volatile is that it's quite subtle unless you're looking at the declaration. I tend to use AtomicReference even if I don't need test&set, unless it's extremely performance critical, but it almost never is. The inner loop variables don't need to be volatile.
"You can find various rants, screeds, and diatribes against volatile on Linux mailing lists and web pages. These are largely correct, but you have to keep in mind that:
Linux often runs on out-of-order multicores where volatile by itself is nearly useless.
The Linux kernel provides a rich collection of functions for synchronization and hardware access that, properly used, eliminate almost all need for volatile in regular kernel code.
If you are writing code for an in-order embedded processor and have little or no infrastructure besides the C compiler, you may need to lean more heavily on volatile."
"It used to have a very specific purpose - to enure memory operations with external side-effects did not get reordered"
Prevent reordering? I thought thats what memory fences are for.
--
I admit to having a (single) volatile varibale in my C++ codebase, used similarly to the pseudocode below:
volatile global bool flag = false; // [1]
thread1 {
while (whatever) {
do stuff
}
flag = true;
thread2.wait();
}
thread2 {
while (!flag) { // [2]
do stuff
}
}
[1] The variable itself is global, because I read someplace that the C/C++ standard does not allow local volatile variables to be passed to other threads. Logically, this makes sense, if a local goes out of scope before the second thread is finished with it. In my code this cannot happen, but I still make it global anyway.
[2] The important part is that this read is very fast unless the flag has been set (at which point I no longer care about efficiency). I don't mind if this is unsynchronized - if the loop runs an extra few iterations, that is perfectly fine, as long as the flag change is seen eventually (realistically, within a few iterations of the loop). I know that volatile only makes sure the compiler doesn't cache the value in registers and does not mean that the value will be synced or flushed or otherwise ensure it is visible by the other thread. On x86 at least, it will be, eventually.
My logic for using it in this way is as follows:
I do not care about synhronization - if the reader sees a stale value of flag, that is fine, as long as it sees the real value at some point in the future. I use volatile, because otherwise the compiler could simply cache the flag in a register completely isolated form the other thread. I also don't mind if the read is reordered, as long as it is within the loop and the value is used as the loop temrination condition (from what I read on the Intel site[3], the above code guarantees this - but the read may be reordered to appear elsewhere WITHIN the loop instead. This is perfectly fine in my case). I do need the write to appear AFTER the loop in thread1 and before the wait, however - again, afaik I don't need to do anything here, or should I put an sfence before the flag=true to be safe? Since I don't care about performance in the flag is true case, I don't mind adding memory fences in this case.
I wonder if somebody can let me know if my logic is off here (though it works on x86 and x86-64 and, accoridng to something I read on the Intel site[3], is a reasonable approach - however, I may port to ARM at some stage, in which case I will need to re-evaluate this code). My aim here is that the reader always reads the flag from the processor cache, so that its fast, but when the writer sets the flag, the cache is synced over the core interconnect and the second thread will, at some stage, see the new value.
Is this approach reasonable? Is it safe? I believe it is, but..
[1] Neither the C nor C++ standards mention threads at all, so I don't think they disallow doing anything with them.
[2] volatile was never strictly about memory ordering, but that's definitely implied. It was really originally for "this variable is really a hardware doodad. make sure you poke it in exactly the manner the code says to." reordering writes to a hardware device can be disastrous. The standard of course, came quite a bit later after the hardware.
[1] I can't for the life of me find where I saw that the standard considers passing local volatile variables to other threads (I suspect it didn't say threads, but rather something else under which threads can be implied).
[2] Implied by the original usage of volatile? It certainly isn't implied now (or at least, people think it is, even though, according to the standard, it really isn't).
Why/How are you assuming the reader will read from the cache? The very definition of volatile means that this read will not be read from the cache! I'd check the generated ASM before assuming that's how it'd work. And read this: http://lwn.net/Articles/233479/
Because it does. Are you sure you aren't misunderstanding the meaning of volatile in C99 (and the C++03 standard has the same semantics for volatile as C99, it even refers back to C99 in a footnote)?
The C standard has no notion of the memory hierarchy and therefore does not know or care about the processor cache. Volatile means that reading/writing from/to volatile variables must strictly follow the rules of the abstract machine, and not bypass these rules as an optimization. When people say that volatile means that the value may not be cached, they mean that the variable MUST be written to or read from every time it is accessed. This means it may not be cached in a register or otherwise avoid the actual variable access as an optimization, that is, that it may not bypass interacting with the abstract machine. What this means is that a memory read or write must be issued, but the existance of processor caches (L1, L2, L3) is outside the scope of the abstract machine and is a platform detail.
On x86 and x86-64, reading or writing memory using temporal load and store instructions lets the CPU, if it feels like it, cache the value in the processor cache. As far as C knows or cares, its in memory, but its up to the CPU to decide if it actually is or not. This means that in practice, if the variable is accessed often, it would be in L1 or L2 cache and reading it would be quite fast. When writing to it (since it is volatile, a write is a store to memory instead of a mov to a register), the processor sees that the cache has changed and invalidates it for other cores, so the next read would hit main memory and get the new value.
Note that non of this is visible in the generated ASM (unless the compiler generated non-temporal loads/stores, in which case the rocessor cache would be bypassed), as it is applied transparently by the processor.
For the record, before I wrote this code, I researched it a lot. Also, Arch Robinson, the architect behind Intel Threading Building Blocks, in the comments to his article on volatile, confirmed that this works (at least on intel platforms). Furthermore, nothing I've read in the standard or other articles (such as the one you linked) contradict my assumptions. Note that anywhere I have tested this, it works as expected. I am interested in hearing if I overlooked soemthing fundamental, though, especially when porting to ARM (which, for example, may require additional instructions to make writes visible to other cores, something volatile will NOT do).
The C standard only states:
An object that has volatile-qualified type may be
modified in ways unknown to the implementation or have
other unknown side effects. Therefore any expression
referring to such an object shall be evaluated strictly
according to the rules of the abstract machine, as
described in 5.1.2.3. Furthermore, at every sequence
point the value last stored in the object shall agree
with that prescribed by the abstract machine, except as
modified by the unknown factors mentioned previously.)
What constitutes an access to an object that has
volatile-qualified type is implementation-defined.
The only thing I can see that you'd have to worry about is the reads and writes to "flag". Technically, it's only assumed to be an atomic operation and while that's generally not a bad assumption for 8-bit variables on x86 SMP, it'll probably give you some trouble on Itanic and ARM. You may need to use an explicit memory barrier.
x86 or x86_64 SMP architectures will generally have an implicit memory barrier for all volatile reads/writes on simple data types, but I'm fairly sure this doesn't hold for ARM. In particular, Visual Studio 2005 and up will treat all volatile reads as membars with acquire semantics and all volatile writes as membars with release semantics.
If you have access to pthreads, a conditional variable would do the trick - but probably overkill. It all depends if you want to let pthreads worry about the portability and all the cpu-dependent ifdefs and ifndefs or if you're willing to code the membars in yourself.
If/when I port to ARM I will probably conditionally compile to use atomic operations (or at least an explicit memory barrier) when on ARM, but for my x86/x86-64 code, since I don't need to, I'd rather avoid it. As you said, it would be overkill.
Since this is the only case where I do something strange, I don't mind handling platform specific code myself. The rest of the codebase delegates such things to libraries.