Hacker News new | past | comments | ask | show | jobs | submit login

> they used insufficient precision for adding too many numbers.

80-bit, 128-bit or sum of 64-bit floats, doesn't matter. Summing an arbitrarily large list of floating point numbers that maximizes precision is actually a hard problem, because the order in which you add the numbers really matters due to incremental errors. Even when this problem is recognized, it turns out the most precise algorithm escaped us for decades--it's not as simple as adding them up low to high. It's only been within the past 10 years that this optimal algorithm has been discovered.

80-bit is a non-portable abomination. It's caused us decades of us "average" programmers headaches and generally has not even given experts better numerical precision.

Good riddance IMHO!




> Even when this problem is recognized, it turns out the most precise algorithm escaped us for decades--it's not as simple as adding them up low to high. It's only been within the past 10 years that this optimal algorithm has been discovered.

Interesting, do you have a link for us lazy but curious folk?


> this optimal algorithm

I'm curious - link please?


There is some good background starting from the Kahan algorithm:

https://en.wikipedia.org/wiki/Kahan_summation_algorithm

This paper from 1993 analyzes five algorithms:

https://www.semanticscholar.org/paper/The-Accuracy-of-Floati...

Sorry I don't a reference for the "optimal" algorithm--I had it explained to me in person by someone working in the field and they didn't give me a specific place to go track it down. The gist of it is that you need to "bin" the numbers by their approximate order of magnitude and then even in each bin you need to keep a separate running estimate of the error within each bin. It's a kind of divide-and-conquer approach. AFAIR it is "optimal" with respect to minimum error, but not optimally efficient.

It's apparently a hard problem!


I did find this:

http://code.activestate.com/recipes/393090/

but according to the comments on it it isn't entirely accurate, either. It is indeed a hard problem, and just having more bits is easy :-)


The non-portability of long double is just bonkers, though. Floating point is super-tricky to get right at all levels, and programmers shouldn't be constantly surprised, and often infuriated, when their compiler introduces crazy amounts of nondeterminism by "helpfully" utilizing 80-bits underneath.

I am actually pretty OK with just having an 80-bit float type in the language, as long as it is always 80 bits, and no 64-bit floats get silently promoted. But C didn't have that because it's not what x87 does the fastest. C just has whatever crap is the fastest. I hate that! I imagine that you did a better job of it in D.

In designing WebAssembly we were very careful to go for as much determinism as possible and stick closely to IEEE 754. There were loud voices calling for nondeterminism (even pushing for the default to be nondeterminism as to whether FTZ mode was enabled!) but in the end the portability considerations won out. I was very heavily on the determinism side.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: