this is less about "skill" but about the awareness how the different CPUs are implemented and where the algorithm is not behaving correctly in conjunction with the CPU spec.
In addition the error class is a mean one: doesn't happen often statistically and difficult to reproduce and as such can be very expensive to track down.
The specs are quite clear about memory fences. Just because something has a failure mode that's hard to detect doesn't mean that luck has anything to do with implementing it correctly. And if luck isn't a factor, then that leaves skill and dedication.
Specs/hw can have bugs too and he never said anything about luck.
I have no issue with hard problems but the accountability for concurrency issues is gnarly. I've had driver issues look like concurrency bugs and concurrency bugs look like driver issues. If you feel the need to take on concurrency you better have the schedule budget for it or be willing to throw it away.
In addition the error class is a mean one: doesn't happen often statistically and difficult to reproduce and as such can be very expensive to track down.