Wait wait wait. M2 processors have 128 byte wide cache lines?? That's a big deal...

CyberDildonics · 2024-06-26T20:08:13 1719432493

In practicality intel CPUs have pulled down 128 bytes at a minimum when you access memory for a very long time.

64 byte cache lines are there an part of other alignment boundaries for things like atomics, but accessing memory pull down two cache lines at time.

Tuna-Fish · 2024-06-29T11:59:04 1719662344

Memory fetch length and coherency unit size are different things, and the latter matters much more.

monocasa · 2024-06-26T18:08:51 1719425331

Yeah, 64 bytes is kind of an unstated x86 thing. It'd be hell for them to change that, a lot of perf conscious code aligns to 64 byte boundaries to combat false sharing.

kllrnohj · 2024-06-26T19:32:34 1719430354

all ARM-designed cores are also 64-bytes. It's not just an x86 thing

201984 · 2024-06-26T22:59:17 1719442757

Some Cortex-A53s have 16-byte cachelines, which I found out the hard way recently.

monocasa · 2024-06-26T19:48:29 1719431309

The Cortex A9 had 32 byte cache lines for one prominent counterexample.

But my point was more that the size is baked into x86 in a pretty deep way these days. You'd be looking at new releases from all software that cares about such things on x86 to support a different cache line size without major perf regressions. So all of the major kernels, probably the JVM and CLR, game engines (and good luck there).

IMO Intel should stick a "please query the size of the cache line if you care about it's length" clause into APX, to push code today to stop #defining CACHE_LINE_SIZE (64) on x86.

jcranmer · 2024-06-26T20:07:21 1719432441

> IMO Intel should stick a "please query the size of the cache line if you care about it's length" clause into APX, to push code today to stop #defining CACHE_LINE_SIZE (64) on x86.

CPUID EAX=1, bits 8-15 (i.e., second byte) of EBX in the result tell you the cache line size. It's been there since Pentium 4, apparently.

You can also get line size for each cache level with CPUID EAX=4, along with the set-associativity and other low-level cache parameters.

Tuna-Fish · 2024-06-29T11:56:34 1719662194

> IMO Intel should stick a "please query the size of the cache line if you care about it's length" clause into APX, to push code today to stop #defining CACHE_LINE_SIZE (64) on x86.

This doesn't help.

The issue with increasing cache line size is false sharing. And false sharing isn't something that's only dealt with by people who know or care about cache line width. The problem is that mostly, people just write oblivious code, then test it, and if it is slow, possibly do something about it. A way to query cache line width gets the infinitesimally small portion of cases where someone actually consciously padded structs to cache line width, and misses all the cases where things just happen to not fit on the same line with 64B lines and do fit on the same line with 128B lines.

Like it or not, coherency line width = 64B is just a part of the living x86 spec, and won't be changed. If possible, we'd probably wish for it to be larger. But at least we were lucky not to be stuck with something smaller.

kllrnohj · 2024-06-26T21:24:57 1719437097

> The Cortex A9 had 32 byte cache lines for one prominent counterexample.

Ok, all arm-designed cores for the last 15 years then :)

khrbtxyz · 2024-06-27T07:38:02 1719473882

From a Raspberry Pi 5: L2 cache line is 128

  Vendor ID:              ARM
  Model name:             Cortex-A76

  $ lscpu -C
  NAME ONE-SIZE ALL-SIZE WAYS TYPE        LEVEL SETS PHY-LINE COHERENCY-SIZE
  L1d       64K     256K    4 Data            1  256                      64
  L1i       64K     256K    4 Instruction     1  256                      64
  L2       512K       2M    4 Unified         2 1024                     128
  L3         2M       2M   16 Unified         3 2048                      64