> AVX2 code is faster than the dedicated instruction for input size 512 bytes and larger
The difference is indeed tiny. Still - it's very cool the generic AVX2 code can beat the instruction burned in the silicon!
> AVX2 code is faster than the dedicated instruction for input size 512 bytes and larger
The difference is indeed tiny. Still - it's very cool the generic AVX2 code can beat the instruction burned in the silicon!