Hacker News new | past | comments | ask | show | jobs | submit login

8 instructions seems very solid - guessing AND/PSHUFB for low nibble, SHIFT/AND/PSHUFB for high nibble, OR to combine plus load/store?

If you have AVX-512, GFNI is faster for this task, but obviously many situations where you can't use it.




> guessing AND/PSHUFB for low nibble, SHIFT/AND/PSHUFB for high nibble, OR to combine

Yeah, that’s exactly what I did in my C++ code with intrinsics.

About ISA extensions, I’m lucky to work on a professional CAM/CAE software. We have specified AVX2 in the system requirements, I’m guaranteed to have the support on our customer’s computers. However, very few of them have AVX512 CPUs so we are ignoring that thing so far.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: