I think the key issue with the optimizations that ICC is performing for C++ but Rust is not doing in this case is just FP-contraction, which is related to, but not the same as, assuming associativity.
The RFC about that is https://github.com/rust-lang/rfcs/pull/2686 , where you see users kind of split into the "I want faster binaries" and "I want more deterministic execution" camps. Neither are wrong TBH.
Some people have tried to show there that enabling FP-contraction by default isn't always better / more precise, but I'm not sure if they succeeded.
> I think the key issue with the optimizations that ICC is performing for C++ but Rust is not doing in this case is just FP-contraction, which is related to, but not the same as, assuming associativity.
I think associativity is necessary to vectorize reduction operations like:
r+=(c[i]-a[i]*b[i])*a[i]*(c[i]-a[i]*b[i]);
I haven't looked at the code generated by ICC, but I would expect it to vectorize this by computing tuples of "partial sums", roughly as follows:
For the error free transform, the important part is the associativity and not the fusion (which would not degrade the operation).
As illustrated in the Wikipedia page, if move terms according to associativ rules you can show that one of the term is always zero and conclude that it is useless, dropping it from the computation.
However, in practice, floating-points are not associativ and that term contains the numerical error of the previous operation.
The RFC about that is https://github.com/rust-lang/rfcs/pull/2686 , where you see users kind of split into the "I want faster binaries" and "I want more deterministic execution" camps. Neither are wrong TBH.
Some people have tried to show there that enabling FP-contraction by default isn't always better / more precise, but I'm not sure if they succeeded.