what else would you measure? certainly the uncontended case is important and a baseline, but otherwise this is kind of weak point for mutexes - that if you don't handle contention well then you have idle hardware or lots of additional scheduler work or kernel crossings.
[edit - I forget to even mention one of the most important things, that locks that reform poorly under contention can have really negative systemic effects like hot spotting the memory network, and that would show up here too]
Uncontended is crucial. If you want to benchmark other things that's excellent, but if MutexA has crap uncontended performance then I'm on a loser if we pick MutexA unless I am absolutely sure we will have a lot of contention. Since contention is never desirable, that's rare.
Think of this like the random input case for a sort benchmark. Do I want stuff like all-ascending, all-descending, zig-zag and so on? Sure, those are nice. But without the random input case the benchmark is not very helpful. I might sort a zig-zag, I might sort data that's already in ascending order, but I will sort random data, that's going to happen or else I would not need a sort function.
Uncontended is uninteresting, because all mutex implementations perform roughly the same here, give or take a nanosecond or two. If you're truly uncontended then a naïve spin lock will actually seem fastest, because xchg is faster than cmpxchg which is needed for good locks.
On x86 you can. When xchg is used with a memory parameter it locks the bus. This is true even in the absence of a lock prefix. I included a spinlock implementation in the blog post. If you see any errors with it, then please let me know!
[edit - I forget to even mention one of the most important things, that locks that reform poorly under contention can have really negative systemic effects like hot spotting the memory network, and that would show up here too]