Hacker News new | past | comments | ask | show | jobs | submit login

AVX-512 allows arbitrary shuffles, e.g., shuffle the 64 bytes in zmm0 with indices from zmm1 into zmm2. Simple shuffles like unpacks etc aren't really an issue.



Worse yet (for wiring complexity or required uops, anyway), AVX-512 also has shuffles with two data inputs, i.e. each of the 64 bytes of result can come from any of 128 different input bytes, selected by another 64-byte register.


Which is also why it's so attractive. :)

Those large shuffles are really powerful for things like lookup tables. Large tables are suddenly way more feasible in-register, letting you replace a costly gather with an in-register permute.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: