Hacker News new | past | comments | ask | show | jobs | submit login

> The x86 has a strict memory model

x86 doesn't really impose sequential consistency between cores/threads. It imposes a Total Store Order (TSO) in which stores are always in order to each other but a store can be reordered after a load.

SPARC had TSO on later chips whereas earlier chips had weaker models. MIPS developed the other way: with older versions having stronger memory ordering and later getting relaxed memory ordering.

RISC-V chips can optionally support TSO but it seems that the motivation is programs ported from x86. IBM's z/Architecture (with lineage back to IBM/360) is still alive and also has TSO. BTW. The Mill is supposed to offer sequential consistency, but it remains to be seen if that will be a performance bottleneck.




That strictness of x86 execution order has been substantially relaxed in the last two decades and can be a bit of a pain to deal with in multithreaded code for the novice. The Pentium 3 added SFENCE (with SSE) and the Pentium 4 added LFENCE and MFENCE (with SSE2). I believe that prior to that, only the LOCK prefix was available.


From the memory model point of view LFENCE and SFENCE are only relevant for SSE non temporal load and stores. A novice is never going to stumble on them by mistake.

MFENCE was added for convenience, but the same effect can be had with any locked instruction on a dummy memory location. In fact XCHG is often still faster than MFENCE.

In fact the x86 memory model has been strengthened in the last couple of decades as some reordering that were theoretically possible (but were never implemented in practice in any hardware) have been finally documented to be impossible since TSO has been embraced.


> x86 doesn't really impose sequential consistency between cores/threads. It imposes a Total Store Order (TSO) in which stores are always in order to each other but a store can be reordered after a load.

To be more pedantic (and hoping I remember this correctly): TSO is indistinguishable in software from full sequential consistency. Any code to detect the difference must by definition be subject to race conditions (or must be an atomic read/write operation that on x86 would be serializing anyway). So x86 in fact does provide SC semantics "between cores/threads". It does have visible reordering artifacts from the perspective of hardware designs (e.g. MMIO registers) where a load has side effects.


That doesn't sound right to me. Dekker algorithm is broken on x86 without explicit barriers but it works on a SC machine.


And there are actual SC machines in the wild. For example, the Carmel cores on the Tegra Xavier processor provide SC guarantees.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: