> The x86 has a strict memory model x86 doesn't really impose *sequential* *cons...

dmatech · on April 20, 2022

That strictness of x86 execution order has been substantially relaxed in the last two decades and can be a bit of a pain to deal with in multithreaded code for the novice. The Pentium 3 added SFENCE (with SSE) and the Pentium 4 added LFENCE and MFENCE (with SSE2). I believe that prior to that, only the LOCK prefix was available.

gpderetta · on April 21, 2022

From the memory model point of view LFENCE and SFENCE are only relevant for SSE non temporal load and stores. A novice is never going to stumble on them by mistake.

MFENCE was added for convenience, but the same effect can be had with any locked instruction on a dummy memory location. In fact XCHG is often still faster than MFENCE.

In fact the x86 memory model has been strengthened in the last couple of decades as some reordering that were theoretically possible (but were never implemented in practice in any hardware) have been finally documented to be impossible since TSO has been embraced.

ajross · on April 19, 2022

> x86 doesn't really impose sequential consistency between cores/threads. It imposes a Total Store Order (TSO) in which stores are always in order to each other but a store can be reordered after a load.

To be more pedantic (and hoping I remember this correctly): TSO is indistinguishable in software from full sequential consistency. Any code to detect the difference must by definition be subject to race conditions (or must be an atomic read/write operation that on x86 would be serializing anyway). So x86 in fact does provide SC semantics "between cores/threads". It does have visible reordering artifacts from the perspective of hardware designs (e.g. MMIO registers) where a load has side effects.

gpderetta · on April 19, 2022

That doesn't sound right to me. Dekker algorithm is broken on x86 without explicit barriers but it works on a SC machine.

my123 · on April 20, 2022

And there are actual SC machines in the wild. For example, the Carmel cores on the Tegra Xavier processor provide SC guarantees.