That's an edge case. But I'm mostly talking about RK1 or Rasp. Pi 5. I'm well aw...

That's an edge case. But I'm mostly talking about RK1 or Rasp. Pi 5.

I'm well aware of "large" ARM chips like Fujitsu A64FX ARM that did well (when it came out) in terms of Linpack performance, thanks to hugely efficient designs like HBM and 512-bit SVE.

I'm not very well aware of what Apple M1 offers to the table, I know some people are experimenting with it but its a far larger chip than RK1 or Rasp. Pi. But its probably smaller than A64FX.

I know that Apple M1 has only 128-bit vectors though, so that's a big penalty vs AVX512 or even the older Fujitsu A64FX. I'd expect them to be bad at Linpack / Matrix multiplication as this is where SIMD shines exceptionally well... and AVX512 on Xeon is a very good SIMD implementation.

I recognize Apple M1 has multiple 128-bit pipelines that operate in parallel per core, so its better than it looks, but there's a huge power-efficiency advantage to 512-bit sized vector units in the Linpack-style matrix-multiplication code.