This code benchmarks mutex contention, not mutex lock performance. If you're locking like this, you should reevaluate your code. Each thread locks and unlocks the mutex for every increment of g_chores. This creates an overhead of acquiring and releasing the mutex frequently (100,000 times per thread). This overhead masks the real performance differences between locking mechanisms because the benchmark is dominated by lock contention rather than actual work. Benchmarks such as this one are useless.