Hacker News new | past | comments | ask | show | jobs | submit login

You do need a fence in the unlock path though (at least a release fence).

I think the issue is that if you ask the CPU to just store something (like in a spin lock), whether or not there’s a fence, it’s an operation with limited data flow dependencies so it’s easy for the CPU to execute. Even the fence can be handled using wacky speculation tricks.

But if you want to do something like, “store this value but only if the old value satisfies some predicate”, then there’s a load and the whole thing is dependent on the load. So you’re asking the CPU to load, then run a predicate, then store, and for that to be fenced, and atomic.

Strictly more work. I don’t think there’s any trick to make it faster than just the store release.




> You do need a fence in the unlock path though (at least a release fence).

Well yes but on x86 that comes for free. The overhead of the full fence brought in by lock cmpxchg or lock xchg is in the order of ~10ns, which for an uncontended lock means that a mutex is almost 2x as slow as a spinlock.

A load acquire + store release would be a couple of ns (assuming everything in L1 etc...)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: