One word: atomics. A simple example would be concurrently writing into a shared ...

tobbebex · on Jan 30, 2019

It’s when you want to synchronize multiple atomics it gets complicated.

With a mutex you can lock for an entire ”transaction” where you fetch a unique index and write your value to the queue. With just atomics you can get a unique index into the queue concurrently with other threads, sure, but how do you mark it as ready to be unqueued once you are done writing your data?

Thats where lock free data structures come into play. They have solved this intricate synchronization between multiple atomics so you don’t have to.

pmalynin · on Jan 30, 2019

While what you said is all true, its not turtles all the way down.

Eventually when it comes down to the hardware level the processor will have to assert #LOCK (or whatever mechanism it uses). So arguably even atomics aren't "lock" free, somewhere along the line some piece of hardware will have to block all other hardware to do its thing. DRAM can only read one thing at a time (and has to write it back afterwards).

gpderetta · on Jan 30, 2019

The hardware guarantees forward progress at the very least and possibly some amount t of fairness. So the underlying hardware is at least lock free and possibly wait free. Usually a CPU cannot hold a cache line indefinitely, another CPU can always sneak in and steal it.

deepsun · on Jan 30, 2019

Yes, you're right. It's turtles until RAM locking mechanisms.

RAM does the locking inside it, and expose it as "compare-and-set" and "compare-and-swap" etc primitives. Various computing languages that use those primitives usually call that "atomic data structures". The thing is that RAM is way faster in that regard than locking on user/kernel level, so for program it looks like there's no locks. But atomics indeed do slow down your program, if just a little, because "compare-and-set" is still slower than just "set".

Gladdyu · on Jan 31, 2019

The RAM does not know anything about locking, it's the cache coherence protocol. The CPU will request a cache line in exclusive state, it does the operation on the memory and ensures that during it has always been exclusive to that core. After all the other cores will observe the change (and depending on the memory model+ordering, the operations that have/will happen before and after).