From the comments there, this is a pretty good article on why neither volatile, nor anything else in C or C++, necessarily does what people want it to in the multithreaded context: http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
Third, avoid using a lazily-initialized Singleton unless
you really need it. The classic Singleton implementation
is based on not initializing a resource until that
resource is requested. An alternative is to use eager
initialization instead, i.e., to initialize a resource at
the beginning of the program run.
I've personally never found the need for a lazy-loaded singleton and I expect the same to be true for most other software developers. This whole problem stems from the premature optimization of thinking lazy-loading something would be nifty.
Standard C and C++ do not have a memory model and do not support multi-threading at the language level. You have to rely on compiler-specific libraries/APIs, or C++0x (which has a memory model).
I don't see how to reconcile your dismissive comment with the fact this is clearly written in the context of the Linux kernel, where it is perfectly reasonable to talk about "the memory model", imposed by the kernel, not the language. Is there some way you mean to apply your comment to the article, or did you just assume it was a generalized comment about volatile?
I assume he's dismissing it because the article is several years old and it's the second article criticizing volatile to appear on the front page today, i.e., it wasn't posted because it's new or interesting, but because it's volatile witch hunt season.
The meaning of volatile keyword is not "do not optimize accesses to this variable". Rather, it's "this value may change at any time".
Let's say you have a variable that's used as a flag for communication between two threads. Protecting it with spin_locks inside a thread that checks the flag repeatedly inside a loop is not going to prevent compiler from optimizing that access away -- because as far as the compiler is concerned, value of that flag does not change inside that loop -- period.
It appears that the real recommendation of the article is to process shared data in special functions that access them by dereferencing a pointer. I don't see how that's more efficient/effective for simple cases like described above than declaring a volatile variable.
The particularly pernicious thing about this flaw in the most common implementation of double checked locking is that it seems like such a simple optimization to one of the most common programming patterns, the singleton.
The singleton is something which the majority of programmers understand vs realizing that the runtime can optimize reads and writes to memory by doing things which make sense in the context of one thread but break in multiple is something I'd only expect the top 10% (and that may be generous) of programmers to understand.
So having the complex part be an intuitive (but wrong) optimization to the easy bit is just a deadly combination.