Hacker News new | past | comments | ask | show | jobs | submit login
How to average floating point numbers (2015) (nu42.com)
145 points by dolmen on June 11, 2021 | hide | past | favorite | 84 comments



I still think the best way to deal with this stuff is to use interval arithmetic. Calculate the numbers with rounding set to round up, and separately to round down and compare the results. See PyInterval https://pyinterval.readthedocs.io/ for an implementation.

>>> from interval import interval

>>> data = (interval(1_000_000_000.1), interval(1.1)) * 50_000

>>> sum(data) / len(data)

interval([500000000.5971994, 500000000.601347])

The result is definitely between the 2 values. If you try another method to do the calculation, (e.g. sorting first) you will know which one was better without the answer having to be obvious as it is in this case.

You might find one method is better at the lower bound, the other is better at the higher bound, which means you can narrow the value further than using either individually. It's a really powerful system that I think should be much better known.


You should be okay with a sum but interval arithmetic can blow up in a way that is not correlated with the numerical accuracy of the algorithm. Newton's method, for example, is strongly convergent but produces diverging intervals.


You can use other algorithms in these cases, and you can do things like converge your interval on a result by testing different values and seeing if the result is above the root or below the root, (or if it's within the error, so not possible to determine).


Why does that happen? Is it because the errors are in opposite directions and will cancel each other out?


That (and overall the unability to take correlation between error into account) and the fact that intervals propagated are upper bounds for some things like the division.


Can't we just use Kahan summation [0] then divide by n?

edit Someone on the reddit thread [1] had the same thought.

[0] https://en.wikipedia.org/wiki/Kahan_summation_algorithm

[1] https://old.reddit.com/r/perl/comments/2zneat/how_you_averag...


Kahan summation is better than naive summation, but not perfect. The stable_mean algorithm in the article is not perfect either. It just has more precision than the naive one, but it more or less has the same precision as twice-as-precise floats.

>>> data = (1_000_000.1, 1.1, -1_000_000.1, -1.1) * 25_000

>>> sum(data) / len(data) == -2.3283153183228934e-16

>>> sum(sorted(data)) / len(data) == 2.7122441679239272e-11

>>> stable_mean(data) == -2.571225228716577e-13

>>> float(sum(numpy.array(data, dtype=numpy.float128)) / len(data)) == 2.2648549702353195e-19

Note that in python there is a math.fsum function which return a perfectly rounded summation of floats. There are multiple different techniques for that, but they are more complex than Kahan summation.

>>> math.fsum(data) / len(data) == 0.0


> Note that in python there is a math.fsum function which return a perfectly rounded summation of floats.

Quirks of the underlying data can yield surprising results.

I was playing around with the "well-known" floating point error result of 0.3 - 0.2 - 0.1 = some small number (instead of 0), and found:

    >>> sum((0.3, -0.2, -0.1) * 1_000_000) / 3e6
    -2.775557561562891e-23
    >>> math.fsum((0.3, -0.2, -0.1) * 1_000_000) / 3e6
    -9.25185853854297e-18
    >>> stable_mean((0.3, -0.2, -0.1) * 1_000_000)
    -1.387894586067203e-17
Namely, the "naive" mean yielded the smallest error (due to cancellation lining up just right such that it's capped after two rounds), fsum yielded a nominally larger one, and stable_mean yielded the biggest error.

Of course, if you sort the values (such that we're accumulating -0.2 + ... -0.2 + -0.1 + ... + -0.1 + 0.3 + ... + 0.3, then sum performs much worse, stable_mean is a little worse, and fsum performs exactly the same.

    >>> sum(sorted((0.3, -0.2, -0.1) * 1_000_000)) / 3e6
    -1.042215244884126e-12
    >>> math.fsum(sorted((0.3, -0.2, -0.1) * 1_000_000)) / 3e6
    -9.25185853854297e-18
    >>> stable_mean(sorted((0.3, -0.2, -0.1) * 1_000_000))
    -3.0046468865706775e-16
edit: Naturally, if you want accurate decimal calculations, you should consider using decimal.Decimal or fractions.Fraction (albeit at the cost of significant performance). Using numpy.float128 will help, but it's still subject to rounding errors due to imprecise representation.


Some of these results are indeed very surprising. The "correct" result of (double(0.3) - double(0.2) - double(0.1)) is -1 * 2^-54 = -5.5511151231257827e-17, and dividing by three comes out to -1.8503717077085942e-17.

Hence, as far as those functions are concerned, the naive mean yielded the worst error in both tests, but stable_mean coincidentally happens to be the best in the first test. I'm slightly surprised that none of them got the correct result.


I think you might be off by one for the "correct" result (i.e. 2^-55), which is what fsum arrives at.

    >>> 0.3 - 0.2 - 0.1
    -2.7755575615628914e-17
    >>> (0.3 - 0.2 - 0.1) / 3
    -9.25185853854297e-18
But either way, the fundamental problem in this case (as you noted) is actually due to loss of precision from numeric representation.


Heh, I reasoned about the exponent by hand and got -55 but then wasn't sure whether it was correct so I searched for discussions of 0.3 - 0.2 - 0.1 and took the value they quoted (2^-54). That being an off-by-one error does resolve my question.



In this case, it seems as if stable_mean is less accurate than the naive sum?


The two behaves differently, so you can not strictly compare the two. Most of the time stable_mean will give more accurate results, but this is not always the case. For example you can create test cases where the naive approach will return an exact result but stable_mean will have some rounding error.

Kahan summation however is always more precise than naive summation.


Interestingly enough I was given this problem (well, numerically stable online variance, but close enough) a while ago as a take-home job interview and I ended up researching this quite a bit.

It has been a while, but if I remember correctly pairwise sum was in practice as good or better as Kahan summation (and both more stable than Walford's) but faster.

But by far the fastest and still very stable solution was to use the legacy 80 bit x87 fpu as accumulator. Which is not surprising as Kahan explicitly added 80 bit precision to x87 for intermediate computations.


How did you force using the x87 fpu?


with GCC long double is always an 80 bit precision type.


Or can we sum more accurately by taking the stable_mean and multiplying by n?


Switching to Python instead of Perl, use the 'statistics' module, https://docs.python.org/3/library/statistics.html :

  >>> data = (1_000_000_000.1, 1.1) * 50_000
  >>> sum(data) / len(data)
  500000000.60091573
  >>> sum(sorted(data)) / len(data)
  500000000.6004578
  >>> sum(sorted(data, reverse=True)) / len(data)
  500000000.6012391
  >>> import statistics
  >>> statistics.mean(data)
  500000000.6
and

  >>> X = [10_000_000_000 + k for k in (4, 7, 13, 16)]
  >>> statistics.variance(X)
  30


Ruby seems to handle this out of the box:

  >> RUBY_VERSION
  ⇒  "2.7.2"
  >> data = [1_000_000_000.1, 1.1] * 50_000 ; nil
  ⇒  nil
  >> data.sum
  ⇒  50000000060000.0
  >> data.sum / data.length
  ⇒  500000000.6
  >> data.sort.sum / data.length
  ⇒  500000000.6
  >> data.sort.reverse.sum / data.length
  ⇒  500000000.6
I know that ruby will auto-promote an Integer into a Bignum. It seems like ruby will also auto-promote other types? I should research this...


Nope. The sum method is implemented in C, and as soon as it sees a float value in the array it switches to Kahan-Babuska summation.

We don’t do that in TruffleRuby and so don’t produce quite such an accurate result.


FWIW, Ruby the language doesn't have Bignum any more, though, just Integer which handles arbitrarily large numbers. Under the hood, there are some optimizations for small integers.

https://bugs.ruby-lang.org/issues/12005


Bignum is still there, working in almost exactly the same way, it's not exposed to the user anymore.


This is a nice thing about Common Lisp, not just integer promotion to bignums, but ratios too (works in complex, also).


I guess the question is does it still use floats internally or use something like BigNum and handle the precision issue (and thus more memory/cpu)


It looks like the second. For many of its operations, the statistics module first converts floats to integer ratios (integers are bignums in Python), does intermediate operations using integer arithmetic, and converts back to floats at the end.

Here's the internal _sum function for example, which you can see maps _exact_ratio over the data before summing: https://github.com/python/cpython/blob/3.9/Lib/statistics.py...


There's also fmean() which is for floats:

  >>> data = (1_000_000_000.1, 1.1) * 50_000
  >>> statistics.fmean(data)
  500000000.6
Implementation at https://github.com/python/cpython/blob/6df926f1c46eb6db7b5dc... .


I just found that boltons supports a mean, in boltons.statsutils.mean . It appears to use the naive summation:

  >>> import boltons.statsutils
  >>> data = (1_000_000_000.1, 1.1) * 50_000
  >>> boltons.statsutils.mean(data)
  500000000.60091573
  >>> sum(data) / len(data)
  500000000.60091573


According to Radford Neal, it is possible to sum a sequence of n-bit floating point numbers exactly, i.e. rounding to the n-bit floating point value closest to the exact result.

The algorithm uses 2n bits for the sum. https://radfordneal.wordpress.com/2015/05/21/exact-computati...


I have a special hatred of floating point numbers. I wish and advocate for a better standard.

My issue is that I've come across errors due to floating point in weird places. One example, searching a list of strings for another string. If the sort function decides those strings look like numbers it may coerce them into doubles (ints being too small to hold them). Then sort wont match correctly due to the least significant bits being rounded off.


Not sure what this could possibly have to do with floating point numbers, nor am I aware of any function that does what you're describing. Is this some kind of JavaScript quirky behavior because nothing about floating point numbers has anything to do with coercing strings.


I've definitely been bitten by similar (not identical) stuff. I've seen data serialized and sent over the wire by I-don't-know-what where several parsers can't correctly parse/serialize and end up with the same thing written back. Be it 32 bit float/64 bit float/string parsed as 32 bit float/string parsed as 64 bit float/string parsed as big decimals: I don't care, the fact is there are buggy serializers, buggy parsers, and even sometimes buggy parsers going into infinite loop (the JVM is famous for that: there was a bug where Double.parseDouble could be sent into an infinite loop by feeing it a carefully crafted string).

The fact is: such kind of SNAFU does happen.


I think the lesson is rather: don't use text based protocols for data serialization if you care about precision. Such things should never happen with a binary protocol.


I've gone back and forth over the years. Having implemented some of the lowest-level floating point routines, taking care to handle all the corner cases, it definitely is a chore. But is there an alternative?

After all of this, I still think that IEEE 754 is pretty darn good. The fact that it is available and fast in pretty much all hardware is a major selling point.

That said, I absolutely hate -0 (minus zero). It's a stupid design wart that I've never seen any good use case for. But it is observable and therefore cannot be ignored, leading to even more icky special cases.


How about determining the quadrant of an angle? That's how atan2() uses signed zero and it's the greatest of all libm functions.


After staring at the MatLab[https://www.mathworks.com/help/matlab/ref/atan2.html#buct8h0...] explanation, I have to admit I don't really get that. Minus zero is a encoding artifact. Values of [x,y] with either `x == 0` or `y == 0` don't have, in the mathematical sense, a quadrant. The fact that a convention preserves an encoding artifact doesn't make it useful.

E.g. I've seen it argued that +0 and -0 represent limits of functions that approach zero from the negative side or the positive side. But that makes no sense, because no other floating point number represents a limit going from above or below the number, so why should 0? Also, all this walking on eggshells around a possible -0 (because that represents the limit of 0 coming from below 0) is just lost when someone writes 0. Admittedly, I've not seen the guts of most numerical code, but the only time I have ever seen a minus 0 written into code was to test the minus-zero handling of something.

/shrug


Don't blame the floating point numbers.

Just use a programming language that doesn't coerce strings into numbers. Or use a comparison function/operator that explicitely compares arguments as strings and not as numbers.


That isn't floating points fault, but whoever wrote the buggy parser..

Floating point is overall pretty good, sometimes I wish there was a version without some many ways to store NAN(for 16 bit floats it is so wasteful), and maybe versions of sqrt/div/rcp/rsqrt that would clamp invalid inputs to 0, but other than that it is hard to see how you make any major improvements given the fixed # of bits.

Actually one other possible improvement would be a way to request the hardware to dither when rounding down to the fixed bit count, although I'm not sure how it could do anything besides pure white noise, but better than nothing..


> If the sort function decides those strings look like numbers it may coerce them into doubles (ints being too small to hold them).

You sometimes even see that with APIs inconsistently wrapping behind quotes or not (seemingly at random) some values. And then it gets trickier because you can have the parser correctly parse into, say, "big decimals" some of the values, while others may be automagically, imprecisely, parsed into floating-point numbers.

The problem is of course made worse by people overall overusing decimal numbers where integers would be totally fine.

One of the most beautiful floating-point bugs ever is this one (from about ten years ago as I recall it): language preference in browsers could be set as a decimal number between 0 and 1. Java (and other languages) had a broken floating-point implementation where parsing a certain floating-point number would lead to an infinite loop. So you could set one core of any webserver with that bug by simply requesting a page, with the language preference set to that floating-point number. Rinse and repeat a few times and all cores were spinning at 100%, effectively DDoS'ing the webserver.


> I have a special hatred of floating point numbers. I wish and advocate for a better standard.

Like what? Packing the real number line in 32 (or 64) bits is an incredible trick. The roughly scale-invariant distribution of represented values means you can do math without regard to choice of units, which would be impossible with any fixed point scheme.


There are "better" standards they are just way way way slower. Which sometimes is what you need.


I suppose if you're not bothered with wasting lots of time, do the following for a precise sum: Select the two smallest numbers from the list, remove them and insert their sum. Repeat until only one number is left.

[Ed. smaller meaning closer to zero]


This is exactly what happens when you sort the number ascending and compute the sum starting from the smallest. Like in the article.

Except this doesn't work if you must handle negative numbers. So what matters is not the smallest numbers but the number nearer from zero.


No, that's not the same.

Summing [4, 5, 6, 7] would yield the series (4, )9, 15, 22 with "sort then add", but what the previous poster described would first sum 4+5 (leaving the list [9, 6, 7]), then 6+7 (leaving the list [9, 13]) and finally arriving at 22.


OK.

That algorithm could be improved (for accuracy, not speed) by replacing sums of the smallest number with multiplications as I suppose that multiplication of more than 3 identical numbers or more is loosing less precision than multiple additions.

For example:

[4, 3, 7, 7, 8] could be calculated as:

  (3+4) => [7, 7, 7, 8]
  (7*3) => [21, 8]
  (8+21) => 29


YAGNI.

If you do, in fact, need to worry about floating point error, then there is not some magical "it will never happen if you have this much precision and use this neat algorithm". Heck, even fixed/decimal/rational numbers aren't perfect solutions, they just make it easier to tell if you're staying within your precision.

Without infinite storage, you can't retain infinite precision, so the only real question is how much error you are willing to tolerate. And that depends very much on what you're doing - I can tolerate a lot more error in rolls on a MMO than in my bank account balance...


> the only real question is how much error you are willing to tolerate

So, you are, in fact, gonna need an algorithm for summation that controls that error, if you're not willing to tolerate very much.


Yes. Similarly, if you design something with total disregard to optimization, it will likely need optimization.

YAGNI is a prediction, not a rule. Just like premature optimization, alternatives to floating point - or using longer floating point types - have real costs, which must be balanced against the needed precision.


Well, his final example suggests otherwise, but I kind of suspect that there must be something else going on there (that I'm too lazy to try to track down). Still, while fixing most floating point errors seem like rearranging the deck chairs on the titanic, if there's one algorithm that's 99.999% accurate and another that's only 99.9% accurate, might as well use and be aware of the more accurate one.


The error in the 'Variance calculated using Statistics::Descriptive' is ~0.0002% of the mean (and of each value).

As the OP says: "One saving grace of the real world is the fact that a given variable is unlikely to contain values with such an extreme range"

My point is that 99.999% algorithm is going to have a cost over the 99.9% algorithm (more complex code, more resource usage), and depending on what you're doing, you may have to consider that cost.


Not in this case. Consider naive sum:

    double fsum(const double *p, size_t n) {
      size_t i;
      double s;
      for (s = i = 0; i < n; ++i) s += p[i];
      return s;
    }
Add one line of code and it not only becomes numerically stable but benchmarks faster too.

    double fsum(const double *p, size_t n) {
      size_t i;
      double s;
      if (n > 8) return fsum(p, n / 2) + fsum(p + n / 2, n - n / 2);
      for (s = i = 0; i < n; ++i) s += p[i];
      return s;
    }
Losers always whine about how their laziness and ignorance is a calculated tradeoff.

What we need to consider is that sometimes there is no tradeoff.


That depends to an extreme degree on your environment, your language, your compiler... etc. What you need to consider is that tradeoffs have two sides.


I understand why the second version might benchmark faster, but why is it more numerically stable?


Because it reduces worst case roundoff errors from being linear to logarithmic. Why is that the case? Best intuitive explanation I can think of is that floating point behaves more like VHS than DVD. If you wanted to bootleg a thousand copies of Titanic, then would you have each copy be made from the previous copy? No. It'd look terrible. You would instead have a stack of VCRs where the copies could be made in a hierarchical way that minimizes the critical path of copies. The overall number of operations may be the same, but if we consider that errors get worse the more errorful something becomes then it should be simple to understand why divide and conquer improves numerical stability.


If you are looking for such accuracy (where 50_000_000.00006 is different from 50_000_000.06) then you need a specialised library and do very careful coding.

If you are averaging bazillions of numbers and they range from weeny to huge and you thing 50_000_000.000006 != 50_000_000.06 for your purposes then it is hard for me to see that you are doing anything sensible.

This is, except in very specialised cases that I cannot imagine, a non issue.

Just add them up and divide, (and check for overflow, that could ruin your day!)


The "Statistics::Descriptive" example (near the end of the article) has to be a bug in the relevant library, right?

Even the naive floating implementation can't possibly get it this wrong for just 4 numbers.


That last example looks more like an outright bug than a floating point error.


This is useful for people working in critical applications. For the majority of people the 0.000001% difference doesn't matter.


That's the wrong way to think about floating point. Double gives you ~15 digits of precision. If the answer is 500000000.600000 but your algorithm says 500000000.601239 then 25% of the bits you've got are kaput. It's also a question of scalability. Increase the author's example array size to 1 gigabyte and now 40% of your double is garbage data. Increase it to web scale, and you're toast. What's the solution? Divide and conquer. Not only will that let you farm out the summation across machines, but it also eliminates the linear growth in errors.


Does numpy.mean() do this?


Simple testing shows it does not use the naive sum.

  >>> import numpy as np
  >>> arr = np.array( (1_000_000_000.1, 1.1) * 50_000)
  >>> arr.mean()
  500000000.6000003


tl;dr: sort first

Better: recursive bucket accumulators using the exponent


Wow, Perl. Now, that's a name I've not heard in a long time.


Give me an (real life) example where these errors matter. Any real data will be noisy anyway and relying on anything at the level of ppb (parts per billion) looks suspicious.

Clarification: I do not say that treating floats careful is not important, and I know doing stuff like subtracting large numbers can lead to inaccurate leading digits. This is clear and part and parcel of any numerical work. But this is not what the article is about. I am claiming instead the article is looking for a problem where there is none (or let's say I fail to see) .


If you're developing floating point code, you're in for a huge surprise, eventually. What happens for example when you subtract two close numbers with an error like this? There are a lot of mechanisms for small errors to become significant.

Many of us have had their attitudes readjusted by a bug where these "insignificant" errors became anything but.

We're often wrong in ways we can't imagine, so it is wise to remain humble and try to strive for correctness over our potentially misguided ideas how things work.

For example, you'd better still have multiple layers of security even though we've already made some other layer "impenetrable". And still check for those "impossible" error cases before they create nearly undebuggable issues.


Yes, one has to careful about floating point, but honestly the article is not about that. Actually it is more about something where you do not have to be careful.


The article explicitly says while you might get away with it with some luck due to characteristics of the real world data, you do have to be careful.

From TFA:

"Now, in the real world, you have programs that ingest untold amounts of data. They sum numbers, divide them, multiply them, do unspeakable things to them in the name of “big data”. Very few of the people who consider themselves C++ wizards, or F# philosophers, or C# ninjas actually know that one needs to pay attention to how you torture the data. Otherwise, by the time you add, divide, multiply, subtract, and raise to the nth power you might be reporting mush and not data.

One saving grace of the real world is the fact that a given variable is unlikely to contain values with such an extreme range. On the other hand, in the real world, one hardly ever works with just a single variable, and one can hardly every verify the results of individual summations independently.

Anyway, the point of this post was not to make a grand statement, but to illustrate with a simple example that when using floating point, the way numbers are added up, and divided, matters."


I work with avionics and it does matter.

Yes, real data is noisy but testing needs do be precise and repeatable. For example, if we need to test that a value is <=100, then it must pass at 100 and fail at 100.00000001. And yes, we use margins and fixed point numbers too but sometimes, we need to be precise with floating point numbers.

It also matters in some calculations, for example when doing GCD/LCM with periods and frequencies. For that, we eventually switched to rational numbers because precision of source data was all over the place.

We all know that parts per billion rarely make sense but where do we draw the line? Sometimes, it really matters (ex: GPS clocks), sometimes, PI=3 is fine. So in doubt, use the highest precision possible, you can relax later if you can characterize your error.


I didn't work in avionics but a field that used a lot of GPS bits. Depending on where your error is, it can be +/- 100 feet or more or less depending on what point the cut off is. Like you say you need to know what your target is.


Of course, the fact that the threshold has to be so precise is because that's the rules, not because of anything related to aviation. That's a human-enforced quality. A threshold at 100 probably has about one significant figure, sometimes less.


Well, not necessarily. The "and repeatable" can be quite a constraint.

Imagine three redundant systems on a plane that want to compute the same from the same input array (and then check whether they all agree). You want to implement this, and figure that (since there's a lot of data) it makes sense to parallelize the sum. Each thread sums some part and then in the end you accumulate the per-thread results. No race conditions, clean parallelization. And suddenly, warning lights go off, because the redundant systems computed different results. Why? Different thread team assignments may lead to different summation orders and different results, because floating point addition is not associative.

More generally, whenever you want fully reproducible results (like "the bits hash to the same value"), you need to take care of these kinds of "irrelevant" problems.


Aviation is all about rules, and sometimes a bit of flying.


And most of the rules are written in blood.


Then use more bits for floats and use naive average. This article is more like "how to squeeze the last bits from a floating point operation."

And even if you use these techniques butterfly effect will kick in eventually.


I was accumulating deltas in a pde simulation on photographic images and when the calculation was tiled, different tiles had noticeable difference in brightness.

Real image data, roughly 1 megapixel per tile at that time, and 32-bit floats.


A field where floating point errors accumulate quickly and can totally ruin your results is computational geometry. See this paper https://people.mpi-inf.mpg.de/~mehlhorn/ftp/classroomExample... for examples.



In general, floating point error can accumulate in iterative algorithms where you are feeding the result of one calculation into the next step.


Yes. If you have a positive Ljapunov coefficient that will happen. But then what the article does won't help you either


If you know about Lyapunov coefficients then you should also know about the concept of order of accuracy. If I'm developing a code that is supposed to be second order in time and space, then when I'm running my convergence study, I better be able to control the floating point error in my computation of the total energy, so that I can properly attribute the error I do observe to power series truncation, or what have you.


The article is not saying that this is even all that useful or revelatory -- but it's just something to keep in mind.


Rounding errors might be small, but catastrophic cancellation can make errors much larger than parts per billion. Maybe for your application it doesn't matter, but sometimes it does, and you need to be aware of these cases.

You wouldn't want those kind of rounding errors in financial applications, in rocket science applications, etc ...


And how does the mentioned averages procedure help?


Sure. Here's an example of blatantly incorrect font rendering due to (what I assume is) floating point error in macOS: https://twitter.com/pcwalton/status/971825249464475648




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: