If 1+x is 1, how much is 1-x?

jasode · on April 21, 2015

Nice post.

When people wonder why floating point calculations sometimes don't perfectly match the expectations of a "correct" answer[1], inevitably people will often respond with the famous paper, "What Every Computer Scientist Should Know About Floating-Point Arithmetic"[2].

Unfortunately, the people who suggest that paper don't realize that it's a very technical treatise and not an appropriate introduction for the types of programmers who ask the question.

I've always thought a better answer was to have the programmer "construct" a toy floating point format (e.g. 8 bits instead of 32-bit-or 64-bits). The programmer would then notice that he has a finite representation of 256 possible "states" to represent -inf to 0 to +inf. The programmer would have to pick a range and a precision to "squeeze" the Real Numbers into that representation of 8 bits.

Design choice 1: I'll have my 256 possible states represent -1billion to +1billion. Well, since you can't overlay 256 states/values across 2 billion unique values, you will have huge "gaps" between numbers.

Design choice 2: I'll have my 256 possible states represent -10 to +10. With a smaller range, you can now increase the precision and represent fractional numbers with digits after the decimal point. But the range is very small. Also, you still have "gaps" where you can't represent most[3] fractional numbers.

The programmer would quickly notice that he'd run into some contradictions. No matter what he does, there will always be gaps. The lightbulb would then go on and then he'd immediately scale up the limitations inherent in 8bit floating point all the way to 64bit floating point and know exactly why 0.3333 * 3.0 does not exactly equal 1.0.

It had been on my vague "todo" list to write such a tutorial but your blog post already gets that "construction of floating point" idea nicely. The programmer can then explore more advanced topics of "normalization" and "bias" in designing a floating point format.

[1] http://stackoverflow.com/questions/588004/is-floating-point-...

[2] http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.ht...

[3]http://math.stackexchange.com/questions/474415/intuitive-exp...

jacquesm · on April 22, 2015

> Unfortunately, the people who suggest that paper don't realize that it's a very technical treatise and not an appropriate introduction for the types of programmers who ask the question.

Programmers of the 'types' (whatever those are) that ask such questions are exactly the target audience of that paper, technical treatment of a technical subject is a-ok.

If you need floating point you don't actually understand your problem used to be a common mantra back when floating point computation still incurred a significant speed penalty. We have come a long way since then but every computer programmer should read that paper and understand it if they want their programs to work properly when using floating point.

You can't really use a tool that you do not understand and using floating point has plenty of pitfalls.

Programmers tend to use floating point casually, either because their language of choice will switch to it when not told otherwise, because they think it is a magic solution to some kind of problem that they have (say counting beyond some arbitrary number), because they are lazy or because they need fractional representation of some number in a well defined range.

If the chain of computation is long enough, if the number of input variables is large enough and if the operations are varied enough this will lead to hard-to-diagnose bugs and problems if not enough understanding is used when implementing the code. FP is a powerful tool, but like any powerful tool it can hurt you if you don't understand it.

semi-extrinsic · on April 21, 2015

I recently came across this blog post [1] with the same title as that famous paper. But they switched out "scientist" for "programmer" and use a more graphical, less technical approach. It's probably better for this purpose.

EDIT: To clarify I mean "better than the famous paper". The blog post linked in this story is also very good, I'd say these two posts complement each other nicely.

[1] http://blog.reverberate.org/2014/09/what-every-computer-prog...

zokier · on April 21, 2015

> Unfortunately, the people who suggest that paper don't realize that it's a very technical treatise and not an appropriate introduction for the types of programmers who ask the question.

I think the main value of that paper for newbie programmers is not to learn all the nitty gritty details about floats (which would be futile), but instead get the idea that floats can, and will, behave unintuitively, and that they are not just decimal numbers like many beginner materials propose.

emmelaich · on April 22, 2015

That would be great if they did, but I suspect that many programmers would look at the style and wave it away as boffin stuff they don't really need to worry about.

Someone · on April 21, 2015

"No matter what he does, there will always be gaps."

Here is how to do this [1]: keep a fixed-size array of every number seen, and store numbers as index into the array. When or if [2] the array fills up, garbage collect by finding a number in the array that is fairly close and hasn't been used recently, and updating it to contain the value you need _now_. Yes, that may change values computed some time ago, but who cares? Your unit tests will run fine, it teaches you to defend your code against bits flipping in RAM, and it is not as if you remember that that fish you caught months ago was 1.8 meters long and not sqrt(3) meter.

[1] if you like designing esoteric languages.

[2] it wouldn't terribly surprise me to learn that many programs do not need that many different float values to survive long.

jessaustin · on April 21, 2015

I'm missing something here. Is the point of this system to use the same floating point value in several different places without storing it in several different places? Why not just use a pointer? If the array is fixed-size, where are you storing e.g. the fact that 01110010 is actually 3.79017423523459?

jordigh · on April 21, 2015

This is a big implicit FAQ for Octave (actually, frequent incorrect bug report), so this is our response mostly written by me:

http://wiki.octave.org/FAQ#Why_is_this_floating_point_comput...

I think people just forget that when they see "0.1", that this can't be represented exactly in binary.

I also link to the following long-winded explanation of how floats work:

http://floating-point-gui.de/

lugg · on April 21, 2015

I always sort of understood this in some form / way; however, it was never laid out for me in such a clear and easy to grok way.

Thank you. An upvote, just doesn't really cover it this time.

wldcordeiro · on April 21, 2015

I found that for more novice programmers this video has a great explanation. https://www.youtube.com/watch?v=PZRI1IfStY0

nitrogen · on April 22, 2015

Wikipedia has some good references, but you have to go to the pages for the specific formats to get the best diagrams.

Examples:

- https://en.wikipedia.org/wiki/Half-precision_floating-point_...

- https://en.wikipedia.org/wiki/Double-precision_floating-poin...

- https://en.wikipedia.org/wiki/IEEE_754-1985#Representation_o...

Pages with loads of information but no bitwise diagrams:

- https://en.wikipedia.org/wiki/Floating_point#IEEE_754:_float...

- https://en.wikipedia.org/wiki/IEEE_floating_point

rodgerd · on April 22, 2015

The best introduction I had was the back of my ZX Spectrum manual, which offered a very lucid and correct explanation of how maths really works on a binary chip.

thomasahle · on April 21, 2015

Here is a nice challenge: Just like we have arbitrary precision integers (allocating more memory as needed) and arbitrary precision "fractional numbers", design a library providing arbitrary precision "computable numbers".

tel · on April 21, 2015

An obvious representation is a convergent stream of fractions. This will also quickly drive home why you cannot implement equality.

The other obvious representation is that of first order formulae over the theory of real fields, but then you need to check equivalence of arbitrary formulae. You ultimately need a limit operation to be included and now you're sunk. You'll also probably wish you were implementing the complex numbers instead pretty quickly.

kenj0418 · on April 21, 2015

> design a library providing arbitrary precision "computable numbers"

I believe they call those "programming languages"

scarmig · on April 21, 2015

Naive question:

Programs written in programming languages aren't guaranteed to terminate. Computable numbers, however, are all real numbers that can be represented to arbitrary precision by a computation that terminates. Wouldn't that mean we don't need a fully fledged programming language to have the capability to represent all computable numbers?

Somewhere in there is the idea we just need a program that determines whether an arbitrary program terminates or not...

mkehrt · on April 22, 2015

I can't tell how much of this is a joke, but, yes, it turns out you can't write such a programming language for exactly the reason you allude to: you can't tell whether a given number is computable in finite time, as you don't know whether your program will terminate.

Shyis · on April 21, 2015

This is a neat idea! Although I'd imagine that it'd quickly approach the power of a full computer algebra system.

darkmighty · on April 21, 2015

Floating point numbers can be justified by two criteria:

1) The distribution of typical numbers

2) The desired precision across a distribution

1: Floating point numbers suggest an exponential distribution, which comes up very often in science, engineering, etc. Very rarely we have real data neatly packet in a small [-a,a] range.

2: Floating point satisfy the following error metric approximately uniformly: for any -max < x < max, Error = float(x)/x; that is, the relative error is small. This again agrees with real world requirements for data, where we tolerate larger errors for larger numbers.

legulere · on April 21, 2015

I think the easiest answer to that question is that there's rounding happening in almost every step where you do something with them. This also includes parsing from decimal floats in textual representation and outputting as decimal floats in textual representation.

The funny thing is that if you work with decimal floating points you usually don't get any of those effects because you often stay inside the accuracy/precision that the float can hold. Yet they're seen as useless.

ColinWright · on April 21, 2015

This is an excellent explanation - clear, concise, easy to work through, and enlightening.

Thank you.

I think I've sent a collection of suggested corrections via the web site, I include just the technical one here:

You wrote:

    Another example -|2|0011 represents the number
    X = - 2^2 1.0011_{2}
      = -  4 (1 + 0 + 0 1/2 1 1/8 1/4 + 1/16 + 1)
      = -  4 * 19/16
      = - 19/4
      = 4.75

This is missing a minus sign at the very end.

gus_massa · on April 21, 2015

Thanks, fixed.

I'll probably fix all the other errors/typos you send in the mail, but it will take a few minutes.

ColinWright · on April 21, 2015

Cool - and again, lovely post, nicely done.

bitL · on April 21, 2015

One of the reasons I would be happier if there were more emphasis in real math/geometry algorithms on "range" processing, i.e. treating each "number" as a scaled "range" instead and figuring out how to combine these ranges in every step to get some stable solution. Most often I see programmers just bashing analytic formulas like they were computing something precisely and then being surprised that results are way off. Imagine computing curve intersections by a combination of analytical and iterative method and then figuring out if such an intersection is unique or if it is some existing endpoint...

maho · on April 21, 2015

Interval arithmetic [1] has been around since the 1950's, I believe, but for some reason it never caught on.

In a nutshell, interval arithmetic is about storing each imprecise number as a lower bound and upper bound. On each calculation, the lower bound is rounded down and the upper bound is rounded up. The existing libraries are well developed, and the performance hit is often surprisingly close to 2.

[1] http://en.m.wikipedia.org/wiki/Interval_arithmetic

brandmeyer · on April 21, 2015

That's because intervals tend to just keep growing. Without some function to make the interval smaller or otherwise keep it bounded, cumulative interval arithmetic tends to be useless.

Using the rounding mode to calculate once with rounding down, once with standard rounding, and once to rounding up will give a much better indication of whether or not an algorithm is sensitive to roundoff error.

bbcbasic · on April 21, 2015

The problem is that the range will increase with each operation. E.g. the range in x*y is more than the range in x and so you need to keep track of those ranges. But the ranges themselves will be inaccurate if stored as floating point numbers.

Or just store them as a fraction if you really care about the value not getting corrupted.

meragrin · on April 21, 2015

This is about floating point arithmetic in computers.

madez · on April 21, 2015

This is specifically about IEEE 754.

Arbitray-precision floating point arithmetic like e.g. GMP offers doesn't suffer these issues.

brandmeyer · on April 21, 2015

MPFR is sensitive in exactly the same way. Just because you can arbitrarily increase the precision of results with that library doesn't mean that it isn't still sensitive to roundoff error.

madez · on April 21, 2015

Yes, you are right.

However, it's possible using MPFR in a way without suffering this issues. One could increase precision every time there would be rounding.

ori_b · on April 22, 2015

The problem is that the second you multiply by pi or e, which is fairly common, your required precision for a perfect result becomes infinite.

Even if you don't, your required precision blows up very quickly, and your performance drops.

madez · on April 22, 2015

In finite precision arithmetic "multiplying by pi or e" makes no sense. And all arithmetic is finite precision. You could multiply by an approximation, and accordingly adjust the precision to avoid rounding.

In more general context, you could choose e as the base of your number system and then multiplying by e is a trivial shift. Then, however, multiplying by 2 would make arithmetically no sense. The natural question that arises is whether it's possible to represent all computable numbers non-lossy in one system. The answer is yes.

Besides that, I think you should re-evaluate your decision-making for votes.

ori_b · on April 23, 2015

> In finite precision arithmetic "multiplying by pi or e" makes no sense. And all arithmetic is finite precision.

Yes. that's kind of my point: Either you have a finite precision cutoff and lose accuracy, or the memory used for your arbitrary precision representation blows up to incredibly large values (transcendentals will do this to you quickly), or a combination of the two.

dragonwriter · on April 21, 2015

> This is about floating point arithmetic in computers.

Its about binary floating point arithmetic in computers, though one could do a similar thing for, e.g., decimal floating point. (Though decimal floating point rounding issues will show up much more rarely in many real-world classes of application.)

UhUhUhUh · on April 21, 2015

That's the very interesting border between numbers as abstractions of quantities and numbers as element of a set (N, Z, R, C etc. The question of 0 is particularly puzzling to me.

wyager · on April 21, 2015

Most numerical sets you're likely to run into (including N, Z, R, and C) are groups under addition, which (among other things) means `x + a = a ⇔ x = 0`

sadfsdfsda · on April 21, 2015

N is not a group under addition (lacks inverses).

pascal_cuoq · on April 21, 2015

StackOverflow user tmyklebu provided an efficient algorithm to solve a more general floating-point equation, C1 + x = C2, using only computations in the very floating-point system that the equation is written in:

http://stackoverflow.com/a/24223668/139746

frozenport · on April 22, 2015

>>Mathematically, if we had infinite precision

Or just fixed precision? For example the calculation is intuitive for `int`s.

eccstartup · on April 22, 2015

Mathematica gives 1.

    Int[1]:= 1 - x /. Solve[1 + x == 1, x]
    Out[1]= {1}

Sevzinn · on April 21, 2015

t0mbstone · on April 21, 2015

If 1+x is 1, then logically, 1-x should also be 1. Period.

If this is not the case, then you have a crappy hardware or software math implementation that doesn't follow logic. Duh.

ColinWright · on April 21, 2015

I'd be amazed if the hardware you're using and the software you're running doesn't have the "feature" that for some value of x, 1+x=1 and yet 1-x != 1. If you read the article you might understand why.

If you don't understand why, perhaps you could come back and ask specific questions.

For the concrete and practical minded who don't want to bother with understanding the theoretical arguments, here's some python:

    #!/usr/bin/python

    a = 1.
    x = 1.

    while a+x!=1.:
        x = x/2

    print 'x =', x
    print 'a+x =', '%.18f' % (a+x)
    print 'a-x =', '%.18f' % (a-x)

Output:

    x = 1.11022302463e-16
    a+x = 1.000000000000000000
    a-x = 0.999999999999999889

eccstartup · on April 23, 2015

No, no! If your program can't get things that even kids can do done, you are writing rubbish.

dragonwriter · on April 23, 2015

Performance optimizations that affect mathematical accuracy are not necessarily "rubbish"; there are certainly cases where they are necessary and appropriate.

OTOH, it is a problem, IMO, that many popular languages make it much easier to use fixed-precision binary floating point representations -- which, while the most performance-optimized way to work with non-integer numbersare also the most prone manifesting incorrect results on real-world inputs -- than any other representation of non-integer numbers.

Use of fixed-precision binary floating point is a performance optimization that, like all such optimizations, should be driven by a real need in the particular application, not just be the default choice because that's what the language makes easiest.

Unfortunately, very few languages make this natural and idiomatic (Scheme and its numeric tower get numbers right, but very few popular languages do.)

ColinWright · on April 23, 2015

I see you posted this on Facebook:

    There was a post on hacker news about floating numbers.
    It said "if 1+x is 1", then "1-x is not."  There is a
    long explanation of how this is true.  They are very
    proud of this outcome. They believe this wired outcome
    is "perfect logical". I just want to say "F**k you!"
    If your program can't handle those even kids can do,
    you are writing rubbish. F**k you!

It's not a case of being proud of it, it's an unavoidable consequence of using a small number of bits (32, 64, 128, 65536, all these numbers are small) to try to represent a very large range of numbers. In other words, it's an unavoidable consequence of using floating point.

The point is that floating point numbers are not mathematics, and they don't obey all the mathematical laws you've been taught. They're an excellent model, provided you stay away from the edges. But if you do go close to the edges, the inaccuracies of the model get exposed. To understand the difference is of real value.

Let's use mathematical reasoning to show why it's true.

Using 64 bit (say) numbers to represent a range larger than 0 to 2^64-1 we must choose either that the numbers we can represent are equally spaced, or not.

If we choose them to be equally spaced then we either cannot represent 0, or we cannot represent 1. To see this, suppose we can represent both 0 and 1. The numbers being equally spaced means that we can then represent 2, 3, 4, and so on up to 2^64-1. Now we have nothing left to go beyond, and thus we are not representing a range larger than 0 to 2^64-1. So if the representable numbers are equally spaced, then we cannot represent both 0 and 1, which seems sub-optimal.

If we choose them not to be equally spaced then there will be consecutive representable numbers r0<r1<r2, such that mathematically r1-r0 is not equal to r2-r1. Now set:

    d0 = r1-r0 so that r0 + d0 = r1
    d1 = r2-r1 so that r1 + d1 = r2

Suppose that d0 < d1, and let h = (d0+d1)/2. Hence (d0/2)<h<(d1/2).

Note that d0, d1, and h might not be representable.

Now r0 and r1 are consecutive representable numbers. The number h is more than half the gap between r0 to r1, so we would want r1-h to round down to r0. But the number h is less than half the gap from r1 to r2, so we would want r1+h to round to r1.

Therefore:

    r1 - h = r0
    r1 + h = r1

Thus it is unavoidable that if the representable numbers are not equally spaced then we can find a and x such that:

    a + x == a
    a - x != a

The above argument still goes through even if d0>d1, or if we want things to round down, or if we want things to round up, and we leave it as an exercise for the interested reader to check the details in those cases.

Now you have three choices:

* Reject this, because it contradicts your belief that floating point numbers must behave in the same way as real, mathematical numbers;

* Demonstrate a logical flaw in the argument; or

* Accept the conclusion, even though it contradicts your belief that floating point number behave in the same way as real, mathematical numbers, and accept that floating point numbers are just a model of mathematical numbers.

eccstartup · on April 24, 2015

Thank you. I hope one day I will not complain about this because I find a better approach.

GFK_of_xmaspast · on April 21, 2015

Is this your first time seeing floating point math.

eccstartup · on April 23, 2015

Is this your second day seeing floating point math?

Dylan16807 · on April 22, 2015

Even leaving aside hardware and software implementations, have you never been in a situation where you put a number into a fixed-sized box, perhaps three digits?

1.002 has to be written as 1.00

.998 can be written as .998

Perfectly logical.