Actually acquire/release barriers or operations are not enough. I'm pretty sure you also need a #StoreLoad for example between Step 1 and Step 2 in the lock algorithm, to prevent the loads from other thread state in step 2 to be reordered before the stores to the own thread state in step 1.
Generally #StoreLoad is required when you need bidirectional synchronization between two or more threads.
Re atomics, I think the algorithm still requires word sized operations to be atomic, but I think that was considered a given. What the algorithm doesn't require is atomic load/stores (or more complex RMW) across more than one word.
Could you implement it using AtomicUsize (using only relaxed operations) in Rust? With AtomicUsize representing the words. You aren't allowed to put pointers in those.
Looking at disassembly of relaxed atomic operations, they are just normal loads and stores with no memory barriers or special instructions. That is not enough to make this algorithm guard a critical section.
It can be (but that's not Relaxed, that's SeqCst), but that can be a lot of overhead in optimization loss and potentially hardware cost on some architectures.
Generally #StoreLoad is required when you need bidirectional synchronization between two or more threads.
Re atomics, I think the algorithm still requires word sized operations to be atomic, but I think that was considered a given. What the algorithm doesn't require is atomic load/stores (or more complex RMW) across more than one word.