> You should only consider going for a lockless algorithm if you can prove it is better than just using a mutex.
This is true for all optimizations: optimizations should be chosen based on profiling, not based on reasoning about the programs. Reasoning can give you ideas of what to profile, but modern programs, including the environments in which they run, are complex enough that you can never really be sure that the assumptions you're basing your reasoning on are true.
So I fully agree with this, but I have some questions about the rest of your post.
> Mutexes are very cheap in the uncontended case, and in the contended case they have some nice properties. Sometimes lockless algorithms are in fact slower than algorithms using locks, as things like atomic compare-and-exchange instructions can have a significant cost.
My (very limited) experience in this area is mostly with optimistic lock-free algorithms, so maybe there's something that I'm missing. But my understanding of optimistic lock-free algorithms is that they make use of the idea that the uncontended case is the most common case, and try to optimize that case heavily. On Intel processors, a compare-and-exchange operation should literally be a single instruction (CMPXCHG[1]). So I'm wondering how the uncontested case using a mutex could be seen as a positive?
> Another problem with lockless is that you can't turn every algorithm into a lockless one, it's actually only a handful of them that are practical to implement locklessly.
How could "you can't turn every algorithm into a lockless one" could be true? It seems to me that you can "simulate" an atomic compare and swap using mutexes--although I can't find a reference easily, I believe this is what GCC's __sync_X_compare_and_swap does on architectures which don't have a compare-and-exchange instruction. Likewise you can implement a mutex fairly easily using a compare-and-swap operation.
Whether this is practical would of course depend on the situation--implementing a mutex in terms of compare-and-exchange or vice-versa are likely not the optimal ways to implement either. But it seems to me that if compare-and-exchange and mutexes can be implemented in terms of each other, isn't that a trivial proof that any mutex algorithm can be implemented as a lock-free algorithm and vice-versa?
What you're describing about writing a mutex with cmpxchg is a spinlock. If you just use that operation it's also a very inefficient spinlock.
The point of a lock free algorithm is to avoid having a thread wait on a resource and not do any work. It may do useless work that needs to be reverted, but it's not sleeping or spinning until something happens. It's not guaranteed that you can do that for every algorithm in a way that makes sense performance wise.
Here's a nice resource I used on my thesis for an adequate but not perfect spinlock with atomic instructions:
https://rigtorp.se/spinlock/
> The point of a lock free algorithm is to avoid having a thread wait on a resource and not do any work. It may do useless work that needs to be reverted, but it's not sleeping or spinning until something happens. It's not guaranteed that you can do that for every algorithm in a way that makes sense performance wise.
Sure, but that seems like a much narrower statement than "you can't turn every algorithm into a lockless one".
Maybe I'm just being pedantic and the person intended something more like what you said, though.
This is true for all optimizations: optimizations should be chosen based on profiling, not based on reasoning about the programs. Reasoning can give you ideas of what to profile, but modern programs, including the environments in which they run, are complex enough that you can never really be sure that the assumptions you're basing your reasoning on are true.
So I fully agree with this, but I have some questions about the rest of your post.
> Mutexes are very cheap in the uncontended case, and in the contended case they have some nice properties. Sometimes lockless algorithms are in fact slower than algorithms using locks, as things like atomic compare-and-exchange instructions can have a significant cost.
My (very limited) experience in this area is mostly with optimistic lock-free algorithms, so maybe there's something that I'm missing. But my understanding of optimistic lock-free algorithms is that they make use of the idea that the uncontended case is the most common case, and try to optimize that case heavily. On Intel processors, a compare-and-exchange operation should literally be a single instruction (CMPXCHG[1]). So I'm wondering how the uncontested case using a mutex could be seen as a positive?
> Another problem with lockless is that you can't turn every algorithm into a lockless one, it's actually only a handful of them that are practical to implement locklessly.
How could "you can't turn every algorithm into a lockless one" could be true? It seems to me that you can "simulate" an atomic compare and swap using mutexes--although I can't find a reference easily, I believe this is what GCC's __sync_X_compare_and_swap does on architectures which don't have a compare-and-exchange instruction. Likewise you can implement a mutex fairly easily using a compare-and-swap operation.
Whether this is practical would of course depend on the situation--implementing a mutex in terms of compare-and-exchange or vice-versa are likely not the optimal ways to implement either. But it seems to me that if compare-and-exchange and mutexes can be implemented in terms of each other, isn't that a trivial proof that any mutex algorithm can be implemented as a lock-free algorithm and vice-versa?
[1] https://www.felixcloutier.com/x86/cmpxchg