It seems like the loop around InterlockedCompareExchange is a bad idea since this is a bus lock in a tight loop. Rather the inner spinning loop that is yielding should just be reading the value surrounded by the cmpxchg. As for whether sched_yield should just be called in the inner loop or a short nop/pause loop should be attempted for microcontention reasons, the expert opinion here is don't bother with the nop loop. However while the nop loop might not be real world optimal I doubt that would be causing a catastrophic performance issue.