Hacker News new | past | comments | ask | show | jobs | submit login

Unfortunately using AVX512 instructions only gets a speedup in very specific situations and for many real world use cases it actually underperforms due to oddities of scaling and switching delays. Profiling for more than a few milliseconds is one place you see phantom gains, so take care not to be deceived.

See

https://blog.cloudflare.com/on-the-dangers-of-intels-frequen...

https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-us...

https://news.ycombinator.com/item?id=21029417

Edit: not saying this isn't a true benefit here, just that claims of speed when using AVX512 need to be treated with fair scepticism for actual use cases.




https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-us...

Considering that the author of that blog is one of the authors of this paper, I think he's very aware of the benchmarking issues.


I didn't realise that. My original post was just to warn that AVX512 benchmarks can be highly misleading. Everyone has troubles measuring AVX512 performance:

"In GROMACS, transitions in and out of AVX-512 code can lead to differences in boost clocks which can impact performance. We are just going to point out the delta here." - from https://www.servethehome.com/intel-performance-strategy-team...


He is aware, but sidestepped these issues. so this code is only recommended on the newest Cannon Lake processors, but we really want to know for which CPU which method is best. What about AMD Rome e.g.?


Since AVX-512 does not exist on AMD Rome, that question answers itself.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: