> just by incrementing a shared counter (which cannot be cached)
That's not true, a shared counter (i.e., an atomic integer) is cached – in fact, there's no guarantee that its value is ever written back to system RAM.
You're probably thinking of non-cacheable memory: the kernel can set the MMU attributes of a memory page such that the CPU will avoid the cache when it accesses addresses in that page. This is completely independent of atomic accesses on memory locations [1].
[1] At least typically – there may well be CPUs which disallow atomic accesses on non-cacheable memory.
But then why would it have to be a shared counter? Any write to a non-cacheable memory location is transmitted to system RAM, it doesn't have to be shared with other cores, nor does it have to be a counter.