>You need an acquire memory barrier after acquiring a lock to make sure that any loads and stores inside the critical section do not get reordered to before the lock was grabbed.
What do you mean? That loads and stores inside a critical section could, in fact, be reordered by the CPU and be executed before grabbing a lock?
Exactly. A load from or store to memory protected by a mutex could be executed before the store that sets the mutex as locked. Unless there is an appropriate kind of memory barrier in between.
So even the Lamport bakery algorithm requires memory barriers, because modern processors can reorder instructions? Does this mean that modern CPU architectures necessarily require hardware-supported atomic instructions for proper concurrency?
Also, is there any evidence as to how common this memory re-ordering is?
> So even the Lamport bakery algorithm requires memory barriers, because modern processors can reorder instructions?
Yes, that is correct. Memory ordering needs to be enforced with barriers for correctness.
> Does this mean that modern CPU architectures necessarily require hardware-supported atomic instructions for proper concurrency?
A non-tearing word size write and appropriate memory barriers are enough. Atomic read modify write instructions like compare and swap are useful but not strictly necessary. I think it would be possible to implement the Lamport's Bakery algorithm from this article correctly with a lot of barriers, but its performance would be awful.
> Also, is there any evidence as to how common this memory re-ordering is?
It is very common (almost all loads and stores get grouped by the CPU) but its effect is so small that it rarely manifests in practice. A reordering would only move a load or store by a few dozen nanoseconds in wall clock time.
I have some practical experience from stress testing high contention memory ordering sensitive code. It would often take a minutes of CPU time to make a bug manifest. That's somewhere between one in a million to one in a billion odds of an unfortunate memory ordering ruining your day.
What do you mean? That loads and stores inside a critical section could, in fact, be reordered by the CPU and be executed before grabbing a lock?