Hacker News new | past | comments | ask | show | jobs | submit login

it isn't always that simple. FMA instructions are tricky to use in a way that actually improves performance, llvm may be doing it right while doing it manually that way may not.

also, sometimes a SIMD instruction is used but only on 1 lane at a time. this is actually common with floating point code.




Something I found surprising: Some AVX2 and AVX-512 instructions consume so much power that Intel chose to have their chips dynamically slow their clock frequency when the instructions are executed. So naively switching to SIMD instructions can not only fail to improve performance, but it can also hurt the performance of unaltered code executed after it -- even unrelated code running on other cores.

https://blog.cloudflare.com/on-the-dangers-of-intels-frequen...


What do you mean "manually" ? `mul_add` is a rust function that operates on a single f64, it's still up to LLVM to choose which instructions to use and to do the vectorization.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: