Hacker News new | past | comments | ask | show | jobs | submit login

Some algorithms guarantee that some arithmetic operation(s) applied to 2+ floating point inputs will result in a list of floating point outputs which when summed have exactly the correct result. This gets all screwed up if you mess with the order of operations or the rounding of intermediate results.

e.g. https://www.cs.cmu.edu/~quake/robust.html

Some keywords to look for: “compensated arithmetic”, “error-free transformations”.

FMAs generally speed up these tools, but you need to be careful and deliberate about how they are used.

(Disclaimer: I am not an expert on this, just some guy on the internet.)




Fair, I think it would be very helpful for Rust if some expert actually knows of a specific example for which this is the case. I think there is an RFC about enabling floating-point contraction by default, that would "silently" be able to do some of these transformations depending on the optimization level.


The one very important thing that often get destroyed by compiler using associativity is the TwoSum Error free transform which is a vital composant of several algorithms that deal with numerical error (most notably the Kahan Summation).

The problem is mentionned in the Wikipedia page of the Kahan summation (and I have been able to reproduce it with gcc) : https://en.wikipedia.org/wiki/Kahan_summation_algorithm#Poss...

This is actually my area of research, I could contribute if you point me to an RFC.


I think the key issue with the optimizations that ICC is performing for C++ but Rust is not doing in this case is just FP-contraction, which is related to, but not the same as, assuming associativity.

The RFC about that is https://github.com/rust-lang/rfcs/pull/2686 , where you see users kind of split into the "I want faster binaries" and "I want more deterministic execution" camps. Neither are wrong TBH.

Some people have tried to show there that enabling FP-contraction by default isn't always better / more precise, but I'm not sure if they succeeded.


> I think the key issue with the optimizations that ICC is performing for C++ but Rust is not doing in this case is just FP-contraction, which is related to, but not the same as, assuming associativity.

I think associativity is necessary to vectorize reduction operations like:

    r+=(c[i]-a[i]*b[i])*a[i]*(c[i]-a[i]*b[i]);
I haven't looked at the code generated by ICC, but I would expect it to vectorize this by computing tuples of "partial sums", roughly as follows:

    r0 += (c[i+0]-a[i+0]*b[i+0])*a[i+0]*(c[i+0]-a[i+0]*b[i+0]);
    r1 += (c[i+1]-a[i+1]*b[i+1])*a[i+1]*(c[i+1]-a[i+1]*b[i+1]);
    r2 += (c[i+2]-a[i+2]*b[i+2])*a[i+2]*(c[i+2]-a[i+2]*b[i+2]);
    ...
and then doing a horizontal sum r = r0 + r1 + r2 + ... in the end. But this requires associativity. (And commutativity, but that's a given.)


Yes, you are right, for those reduce operations you need associativity.


For the error free transform, the important part is the associativity and not the fusion (which would not degrade the operation).

As illustrated in the Wikipedia page, if move terms according to associativ rules you can show that one of the term is always zero and conclude that it is useless, dropping it from the computation.

However, in practice, floating-points are not associativ and that term contains the numerical error of the previous operation.


It isn't a matter of being a rust expert, fused multiply add is an instruction on CPUs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: