Yeah, 64 bytes is kind of an unstated x86 thing. It'd be hell for them to change that, a lot of perf conscious code aligns to 64 byte boundaries to combat false sharing.
The Cortex A9 had 32 byte cache lines for one prominent counterexample.
But my point was more that the size is baked into x86 in a pretty deep way these days. You'd be looking at new releases from all software that cares about such things on x86 to support a different cache line size without major perf regressions. So all of the major kernels, probably the JVM and CLR, game engines (and good luck there).
IMO Intel should stick a "please query the size of the cache line if you care about it's length" clause into APX, to push code today to stop #defining CACHE_LINE_SIZE (64) on x86.
> IMO Intel should stick a "please query the size of the cache line if you care about it's length" clause into APX, to push code today to stop #defining CACHE_LINE_SIZE (64) on x86.
CPUID EAX=1, bits 8-15 (i.e., second byte) of EBX in the result tell you the cache line size. It's been there since Pentium 4, apparently.
You can also get line size for each cache level with CPUID EAX=4, along with the set-associativity and other low-level cache parameters.
> IMO Intel should stick a "please query the size of the cache line if you care about it's length" clause into APX, to push code today to stop #defining CACHE_LINE_SIZE (64) on x86.
This doesn't help.
The issue with increasing cache line size is false sharing. And false sharing isn't something that's only dealt with by people who know or care about cache line width. The problem is that mostly, people just write oblivious code, then test it, and if it is slow, possibly do something about it. A way to query cache line width gets the infinitesimally small portion of cases where someone actually consciously padded structs to cache line width, and misses all the cases where things just happen to not fit on the same line with 64B lines and do fit on the same line with 128B lines.
Like it or not, coherency line width = 64B is just a part of the living x86 spec, and won't be changed. If possible, we'd probably wish for it to be larger. But at least we were lucky not to be stuck with something smaller.
M2 processors have 128 byte wide cache lines?? That's a big deal. We've been at 64 bytes since what, the Pentium?