This is the way. That code breaks when someone new comes along and makes several modifications that has implications which take a long time to rear their ugly head.
Broken multi-threaded code behaves a lot like perfectly coded multi-threaded code....until you get a lot of traffic through that code, you know, like on a busy holiday weekend when lots of revenue is flowing through the business!
Exactly. And you don't always need a lot of traffic, so even if your code survives load testing doesn't mean it's correct. Just last week I ran into a serious problem with a piece of code that handled infrequent events from several different threads, and an almost unrelated refactoring changed timing so that events arrived from two different threads exactly at the same time, on exactly one family of Android phone models. The original mistake had been in production for a long time without anyone noticing.
I’m similar, though I wish programming languages would gives us more facilities to statically double-check the correctness reasoning.