All software will use V once it is available. Any operation done in a loop over ...

limoce · on March 16, 2022

Any software that uses memcpy will use V once glibc is updated.

owlbite · on March 16, 2022

I must admit, I've yet to find a performance relevant loop that I can't do in simd that doesn't also have dependencies on previous iterations such that no magic instruction set is going to help unless it's capable of time travel.

namibj · on March 16, 2022

LLVM's Polly[0] can often do magic there, by renumbering[1] the iteration space. Where variable vector length instructions help is decoupling the chunk size from the machine code, because they take care of the remainder that doesn't fit in whole vectors/chunks in an agnostic fashion. It's so you can get at least most of the gains from wider vector units without needing to change the code.

[0]: https://polly.llvm.org/ [1]: https://en.wikipedia.org/wiki/Polytope_model

Dylan16807 · on March 16, 2022

It's not that you couldn't have used fixed SIMD, it's so it can rescale the SIMD automatically.

owlbite · on March 16, 2022

Right, but portable code != portable performance. See also: OpenCL.

There's also the observation "keep simd vectors small but many" (e.g. Apple's arm chips) over "super long vectors" (intel avx512) is superior as it is much more flexible whilst delivering similar performance for tasks that are amenable to larger vectors. Having an architecture pushing towards the latter seems a retrograde step to me.