Hacker News new | past | comments | ask | show | jobs | submit login
The End of Numeric Error: An interview with John L. Gustafson (acm.org)
148 points by ozdave on April 26, 2016 | hide | past | favorite | 60 comments



Here's a python port of Gustafson's Mathematica unum implementation that I did:

https://github.com/jrmuizel/pyunum


Hi, thanks for sharing your implementation. Am I totally misusing (py)unum?

Using pyunum, repeatedly rotate the point (1,0) by 45 degrees. Print out the X coordinate every 8 rotations. Mathematically, the value printed each time should be exactly 1. Unfortunately, after just 79 iterations, the unum covers the range (-27352064, 27352064)! This includes the correct result but is useless.

A similar program using Python floats (i.e., platform doubles) prints 0.9999999999999937 at step 79 and runs in 1/540 of the time on my oldish core2 notebook. (or 0.9999999999371962 at step 799999 in less than 1/10 of the time of the 79 unum steps)

    from unum import *

    a = divideu(x2u(1), sqrtu(x2u(2)))

    x, y = x2u(1), x2u(0)
    for i in range(80):
        x, y = timesu(minusu(x, y), a), timesu(plusu(x, y), a)
        if i % 8 == 7: print i, view(x)


Yes, you're misusing the concept, and pyunum makes it hard to use in this situation (probably because the author had a different purpose in mind.) Try re-running (in a fresh python invocation if you're in a shell) with this at the top of your script:

  import unum_config
  unum_config.e = 3
  unum_config.f = 9
This gives 2^3=8 bits for the exponent, and 2^9=512 bits for the fraction. The default values for e and f are 3 (8 bits) and 4 (16 bits) respectively.

The idea with unums is that your program sees the unacceptable bounds, and recomputes with higher precision.

Comparing speeds is very unfair: one implementation is hardware which has received maybe $100B-worth of optimization over decades, the other is one guy's python script.

In this implementation, 9 seems to be the maximum value for f: if I set it to 10 unum fails to import because it tries to divide a float by too large an integer. And 4 seems to be the maximum practical value for e. At e=5, initialization of unum takes over two minutes and 250M of memory, probably because it's computing the largest possible value in that floating point scheme.


Yes. My intention with pyunum was to match the Mathematica code as much as was reasonable so that it's possible to follow the code samples in the book without requiring Mathematica. The result is unpythonic and very slow but should be usable for experimenting with the semantics of unums.

It would certainly be nice to have more pythonic port.


Ah, I can see that I wouldn't necessarily get a very good result if I'm actually doing calculations with a 16-bit mantissa.


Did anyone find peer-reviewed papers on the topic (e.g. in an IEEE journal given that it challenges the IEEE floating point standard)?

I think the book is nice, but popularizing "unum"s with a book targeting non-experts feels like skipping an important step in due process, given the grand claims that are being made.


Plenty of experts refereed the book, believe me. William Kahan, David Bailey, Gordon Bell, Horst Simon, Ulrich Kulisch, John Gunnels, and anonymous reviewers as well. It was vetted for months before it was released. If you prefer a more formal treatment, Ulrich Kulisch has written up the unum concepts in a paper more palatable to mathematicians.

Incidentally, I submitted a short version to the ARITH23 conference detailing how IEEE 754 could be repaired to give bit-identical results on different systems, and they had it reviewed by three reviewers, one who liked it, and the other two who served on the IEEE 754 committee and trashed it for some pretty silly reasons, like using the word "subnormal" instead of "denormal".

There is no easy way to boil the ocean, but I'm trying.


From what I'm understanding, this format uses a variable number of bits, automatically growing and shrinking as needed.

How can this be implemented efficiently? If there's a cap on the maximum number of bits (eg, 64), why not just use that maximum number all the time, and not have to worry about shuffling things around (bitwise!) in memory?


if it's supported in the hardware, you could turn off those gates altogether and save energy.


The article answers that question.


Isn't this yet another implementation of interval arithmetic? What am I missing? Calling his book "The End of Error" seems a bit much.


> A unum has three additional fields that make the number self-descriptive: The "ubit" that marks whether a number is exact or in between exact values, the size of the exponent in bits, and the size of the fraction in bits. So not only does the binary point float, the number of significant digits also floats. There is something called "significance arithmetic" that does a crude version of what the unum representation does, but unum math overcomes some serious drawbacks of both significance arithmetic and interval arithmetic.

These "unums" have been posted before, e.g. https://news.ycombinator.com/item?id=11410349


No, it is making a distinction between exact and inexact numbers, and uses arbitrary precision.

I don't see it gain wide acceptance. If your problem is fixed-num, use a bignum library. If it is not, chances are you will meet a number that isn't representable in this format very, very soon (1/3, 1/5, sqrt(2), PI, etc.)

The moment you do, you will have to decide to how many bits of precision you want to compute that inexact number. Problem is that you cannot make that choice, unless you know exactly what you intend to do with the number. I bet most users will choose a format that their hardware handles natively: IEEE.

There may be edge cases in numerical computing that benefit from this, especially when problems are ill-conditioned, but even there, I expect people to convert to conventional IEEE on the outside of their API. Also, I'm not sure there is much room in this space below quad-width floats (but I am not an expert)


wouldn't most of the issues be solved with use of bcd/dcd? you get the speed (compared to arbitrary-precision), fixed-precision, and exact answers up to the precision limit?


Unfortunately, no.

BCD as "binary coded decimal" is, effectively, useless. Even arbitrary precision libraries tend to be faster than BCD.

"decimal floating point", which is what I think you meant, just changes what isn't representable. Instead of 1/5 causing a problem, 1/3 causes a problem.

The "classic" problem about this is matrix operations. Specifically, matrix inversion. You can use interval arithmetic on matrix inversion, and you find that the intervals blow up to almost infinity after just a couple of operations even though the center of the interval is actually pretty close to the solution. Of course, the reverse also holds: while most matricies invert relatively cleanly, there are always pathological cases that really do need humungous precision to invert.

PS: Don't know why you were getting downvoted as you asked a legitimate question.


Thank you the answer! It was very insightful. Also, no big deal about downvotes. Can't please everybody.


Not especially. Just floating point without fixed size significand and exponent. I do like the idea of having an "exact representation" bit carried along though..

Edit: Link has been changed to a more in depth ACM interview! I suppose this is a floating point format that makes the intervals more explicit by changing size to account for significant digits.


I don't see big benefits for the 'exact' bit. There's only a fixed (in theory, countably infinite and infinitely dense) set of numbers that can be represented exactly. If you want to go for such a set of exactly-representable numbers, wouldn't the rationals be the better choice? That allows you, for example, to do dollar accounting in exact arithmetic.

Also, to enable round-tripping with string representations, your printf/scanf pair has to discriminate between exact and inexact versions of, for example, 3/4. If you don't, every string you read in (e.g. from XML (including reading modern spreadsheet data), json, or from user input, will have to have the 'exact' bit cleared. I guess so many numbers will be inexact that it turns out to have close to zero information content (in other words: adding it to your numbers is a waste of space)


It avoids error whenever possible. For example, if you are working with pure fractions the result will be exactly representable and you'll end with 1/3 + 1/3 + 1/3 = 1 instead of some approximation.


That would be nice, but 1/3 isn't exactly representable in this representation.

Or are you suggesting that, when computing 1/3, this number format picks the best representation for a given number of bits, keeping the 'exact' bit, for 1/3, and then magically decides that, say, 85/256 and 'exact' must be 1/3, and not 85/256, or whatever other expression that could be computed exactly gets close to it?


More importantly, the book fails to mention affine arithmetic.


Call me when there's a hardware implementation.


<6 months for FPGA, ~12-18 months for silicon...


Is that in the article? Or else, how do you know? I would like to write an implementation for fun in another language, but unum is getting mixed reviews here. Hence, it would be nice to see your source for it being incorporated into hardware.


At the bottom of the article, John mentions REX Computing, which I am the founder/CEO of. John has been a friend and advisor, and we have played around with minimal unum implementations, but haven't been able to commit the resources to a full one until after we tapeout our first chip in a little over a month (which will be using IEEE float). We also funded "dnautics" (a HN user who also commented here) to do a soft implementation of unum 1.0 in Julia.


Has REX explored other alternative binary floating point formats? Say, dec64 [0]?

[0] http://www.dec64.com


How is dec64 different from Floating Point (IEEE 754)? It seems IEEE 754 allows for using base 10. What else do they do differently, do you know?


Besides using only base 10 representation, 1) all numbers are exact even if there are multiple representations. 2) because of 1), .1 + .2 = .3, eg, arithmetic follows the same rules you learned in school for 4 function arithmetic. 3) It supports upto 56-bit precise integer math without loss of precision. Look around YouTube for crockford talks about it. Usually they're JavaScript related. Or check dec64.com


Not that relevant, but

>Because the Julia language is particularly good at adding new data types without sacrificing speed, several people jumped on making a port to Julia, including the MIT founders of that language.

I'm really impressed with the Julia language so far and considering implementing some quantum chemistry models for many-body systems (Hartree-Fock and probably Coupled Cluster if I ever get that far). Mostly because a) Julia looks fun, b) FORTRAN is still king of speed (which I'll need all I can get of) but is a pain to write and c) the alternative to FORTRAN is C(++), where being OO isn't that much of a bonus. And Julie looks fun to write.


Julia is really impressive. So, I uh, was working on a julia implementation of unums till I stopped being funded for it...

I'm currently writing unit and continuous tests for a computer chip that is being simulated - basically julia does the computation in x86, which suggests what the IEEE result should be, and then generates the assembler and passes it on to the assembler.

It probably could have been easily written in something like python, ruby, or perl, but the modularness of Julia, functional programming, and (optional) explicit typing result in easy-to-read code and a sense of security that there are no shenanigans going on under the hood with the type system.

Another cool thing is for more complicated code (working on a demonstration fast fourier transform) is that I can write the core FFT algorithm as a function with macros, and by swapping a different include that defines the macros differently, it can either run it as a virtual processor using internal Julia arrays that represent the memory and register files, or build up a file that contains the assembler code. Again, this is obviously possible in other languages [0], but Julia makes it look nice and easy to comprehend.

[0] I'm thinking that ruby (especially with it's DSL-eyness) would be really good for this, but I abandoned ruby when it became too difficult (I'm lazy and dense) to install the correct version on my own computer much less figure out how to install it on a server for other people to use.


Here's a link to the WIP Julia unum implementation... https://github.com/REX-Computing/unumjl


It would be really interesting if you'd like to implement some quantum chemistry! Rick Muller (of PyQuante fame) has implemented a port to Julia, but I think there's a lot of room to use more language features.

https://github.com/rpmuller/pyquante2/tree/master/julia

Let me know if you do want to do some quantum chemistry coding. It would be good for my soul to revisit some of my PhD roots...


> FORTRAN is still king of speed (which I'll need all I can get of) but is a pain to write

I'm surprised that, with the restrict keyword, one would see better code gen from Fortran than C. Isn't the lack of aliasing Fortran's primary advantage?


Leveraging "restrict" is not a trivial thing to do.

If you're a researcher first and a software developer second (which I suspect is true of gp), the last thing you want to do is get into the weeds of inspecting gcc assembly to figure out where you didn't put a "restrict" that you should've.


I started a DFT code in Julia two years ago but didn't get too far. I always felt that I have to think how to make the code fast in Julia (types/vectorization), whereas I just know it will be fast in Fortran. So I always end up writing modern Fortran C-binded to Python. Was it just me not paying enough time to get used to Julia or was/is it a real problem?


Have you actually profiled the Julia code you wrote and ensured that you have followed the performance tips in the manual? If you did it correctly all the time should be spent in linear algebra operations, and so it shouldn't matter what language you implemented the code in since the performance is primarily determined by your underlying BLAS library.


It's not that I couldn't make it fast. I followed just the steps you mentioned and got to a really fast code. But the process didn't feel intuitive.


probably the best way to do it would have been to bind the fortran functions to julia.


I dunno. I just started doing probabilistic programming in Haskell yesterday, and it's turned out to be faster at it than every other PPL I've tried. It didn't help that WebChurch, Anglican, and Venture are all interpreted languages. Treating a sampler as a lazy stream of samples is a nice abstraction, too.


As a counterpoint, my attempts to write fast Haskell PDE solvers have been underwhelming.


Once you wrap your head around the Haskell way, it seems easy to get something working. IMHO that something will be minimal and pretty as well. Once i try to make things fast, it gets weird. There are people who can write very fast Haskell code, but it seems like all of those people worked on ghc. I don't have a good mental model of when, say, a thunk will get created, so i liberally sprinkle strictness annotations, and fool around with list fusion operations.

I really like Haskell. Even with all the effort i've put into it, it seems like i've barely scratched the surface if i want performance. sigh


Did you try Stan? It compiles your model to SSE2 enabled c++


For my current problem, I don't want the available inference methods to impose any restrictions on what sort of model I can write, so I'm using general PPL only.


You can write you model in Stan and sample without inference too.



Why bother? The precision of double is sufficient for the vast majority of engineering problems. Seldom do you have measurements that are more precice than the calculations you do with them.


"vast majority"? Maybe.

But anything which uses matrices (aka. systems of linear differential equations), always runs into some pathological cases. Quite often, they aren't even uncommon.

Systems which have vastly different time constants in the same system are ALWAYS a problem. Simulating RF circuits or phase locked loops is always an issue.

Computational fluid dynamics always has boundary conditions that challenge infinity/zero and drive the solver crazy.

Something as pedestrian as simulating basic cotton cloth accurately is problematic (the warp/weft redistributes energy on a much faster time scale than gravity pulls the cloth down).

A lot of people did a lot of research and put in a lot of programming hours to make solving these problems relatively easy without understanding the gory details.


But will unums help? No matter how an ill conditioned matrix is represented, it will still be ill conditioned, and we'll still be iterating, unless I've missed something in this article.

I'm skeptical that changing how we represent the numbers will help - when the physics are icky, they're icky.


It won't really help. You'll get the same issues with stiff DEs or ill-conditioned matrices for perfect precision or floating-point. There will be some cases where it will help, but not a lot.

I've implemented a few DE solvers and sparse matrix implementations before. I think sparse matrices is one case where it would help.


Yes, this won't change the challenging nature of solving the underlying problem.

More problematic is that when a matrix is ill-conditioned it may not be all that helpful to have "an answer" since the ill-conditioning tells you there are many solutions closely satisfying the system of equations.


Naive geometry calculations fail very hard, very fast. There is a reason why Solidworks has you draw on the surface of something that already exists, and extrude a feature up from that, instead of computing the union of two 3D primitives. Computing the coincidence of two lines or planes is a quick path to woe. The geometry package CGAL uses infinite precision rationals instead of floating point, and also bans curves. A circle is a many-sided polygon, and pi is not acknowledged to exist. Floating point and computational geometry are uneasy aquaintances.


A big problem is that chained floating point operations are a O(sqrt n) martingale.

Also, some floating point ops that use pade approximants (most log algorithms, sin, cos) lose onoe or two precision points (versus half for arithmetic ops) of floating point accuracy.


> A big problem is that chained floating point operations are a O(sqrt n) martingale.

Mostly false; see https://www.cs.berkeley.edu/~wkahan/Mindless.pdf

It is true that an arbitrary, independent sequence of n floating point operations can potentially lose up to 2n bits of precision. However, it is false that most scientific computing code has such structure. On the contrary, the loss of precision in numerical algorithms can be estimated quite well using standard techniques of error analysis, especially when there is underlying matrix structure to exploit. The growth of error in properly written numerical code should NOT grow exponentially with the number of floating point operations. It should be bounded by intrinsic properties of the underlying matrix, such as its condition number or spectral norm.


Yeah, I know, there are cases where it matters. But I suppose many of these cases are also the ones where performance (and a hardware implementation) matters, and you're better off designing your algorithm to deal with roundoff error than use a magically precise FP implementation.


sure. You could go the KSP route and write your own FP. There are still problems like the various krakens. KSP is obviously not so critical that krakens are life- or project- jeopardizing... But imagine being NASA for example, designing real world satellite guidance systems that must operate over many orders of magnitude.

If you're using IEEE floating points, your systems will silently fail and (rarely) give an error like NaN. IEEE intervals will work, but often give a diagnostically useless answer. Unums have characteristic signatures that hint where to look to find the problematic calculation. You can then backtrack and redo the calculation at a higher precision and drop back to the standard precision as necessary. This is, of course, automatable.


Calculations with fractional numbers are done for many mathematical and scientific purposes as well, where arithmetic system obeying clear set of mathematical rules, designed with correctness and reliability of results in mind, would be useful and liberating. Even engineering calculations would benefit from faster and more economical operation that unum hardware can possibly provide.


Because, you know, computing is not restricted to engineering problems. People occasionally use computers for all other weird things, like counting _money_. Or formatting text.


He gives an example of solving complex nonlinear systems [1] in a very few operations by using the inherent parallelism available via subdividing intervals. He claims both accuracy and huge speed gains are possible.

[1] http://www.johngustafson.net/presentations/Multicore2016-JLG... (see example starting p.10)


Not just precision, but it also allows more extreme values. Very large or very small numbers.

In statistics numbers often get very small because they represent probabilities. Sometimes they transform into the log domain, but that has a lot of disadvantages. You lose zero, negative numbers, addition is difficult, and you get vastly more precision near zero than near one.


You'd be surprised how many people don't build rails apps for advertising companies.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: