Then ignoring the existence of the weaker memory model of ARM (versus the stronger x86/x64) is even stranger. And even if x86/x64 models are stronger, they still need some fences and using them all the time would be too slow. So I still don't really understand the arguments of the article.
The article does not ignore architectures like ARM. You do need fences, but not all the time - the compiler can avoid fences in places where there is no danger of violating sequential consistency (e.g. on accesses to provably local objects).