> When I format a float out to 5 decimal places, I'm sort of making a statement that anything beyond that doesn't matter to me.
Yes, it's true the vast majority of the time it doesn't actually matter. However decimal formatting does such a good job at giving us the impression that these errors are merely edge-cases, and calculators automatically formatting to 10sf etc further that illusion. If people are not aware it's only an illusion (or just how far the rabbit hole goes) it can be dangerous when they go on to create or use things where that fact matters.
haha, yes it's a terrible name, such arrogance to suggest fitting infinite things into a very finite thing. In fact they couldn't even call it rational, because even after the precision limitation it's a subset of rationals (where representation error comes from due to base)... Inigo Quilez cam up with a really interesting way around this limitation where numbers are encoded with a numerator and denominator, he called "floating bar", this essentially does not have representation error, but it will likely hit precision errors sooner or at least in a different way (it's kinda difficult to compare directly).
Yeah, that is more what I'm on about. I can accept that a computer's `int` type is not an infinite set, even if it does cause problems at the boundaries. It feels like more of a little white lie to me.
Whereas, even between their minimum and maximum values, and even subject to their numeric precision, there are still so many rational numbers that an IEEE float can't represent. So it's not even a rational. Nor can it represent a single irrational number, thereby failing to capture one iota of what qualitatively distinguishes the real numbers. . . if Hacker News supported gifs, I'd be inserting one that features Mandy Patinkin right here.
Yeah, I think this is the single aspect that everyone finds unintuitive, everyone can understand it not having infinite precision or magnitude. It's probably a very reasonable design choice if we could just know the reasoning behind it, I assume it will be mostly about practicality of performance and implementing the operators that have to work with the encoding.
>Inigo Quilez cam up with a really interesting way around this limitation where numbers are encoded with a numerator and denominator, he called floating bar
Thanks for the read! More fun than the original article to my taste :)
Here's the link, since Googling "floating bar" nets you very different results:
I always wonder if he ever says "My name is Inigo Quilez. Prepare to learn!".
My favorite post by him is where he explains that you never need to have trigonometry calls in your 3D engine. Because everyone now and then you still see "educational" article in the spirit of "learn trig to spin a cube in 3D!" :/
Yeah, I've noticed in physics programming in other peoples code you can often come across the non-trig ways of doing things that avoid unit vectors and rooting etc (which are both elegant and efficient)... However i've never personally come across an explicit explanation or "tutorials" of these targeted at any level of programmer's vector proficiency. Instead i've always discovered them in code, and had to figure out how they work.
I guess the smart people writing a lot of this stuff just assume everyone will derive them as they go. That's why we need more Inigo Quilez's :D to lay it out for us mere mortals and encourage it's use more widely.
"integer" as a type is less offensive though, as it only has one intuitive deficiency compared the mathematical definition (finite range). Where as "real" as a type has many deficiencies... it simply does not contain irrational numbers, and it does not contain all rational numbers in 3 respects: range, precision and unrepresentable due to base2, and for integers it can represent a non-contiguous range!
But I can build a computer that would accept any pair of integers you sent it, add them and return the result. Granted, they'd have to be streamed over a network, in little-endian form, but unbounded integer addition is truly doable. And the restriction is implicit: you can only send it finite integers, and given finite time it will return another finite answer.
You can't say the same about most real numbers, all the ones that we postulate must exist but that have infinite complexity. You can't ever construct a single one of them.
It is not a matter of modernity, but a matter of use-case. You don't need arbitrary-length integers for a coffee machine or a fridge. You don't even need FP to handle e.g. temperatures; fixed-point is often more than enough. So if you are making some sort of "portable assembler" for IOT devices, you can safely stick with simple integers.
> I think that illusion can be a bit dangerous when we create things or use things based on that incorrect assumption.
I'd be curious to hear some of the problems programmers have run into from this conceptual discrepancy. We've got probably billions of running instances of web frameworks build atop double precision IEEE 754 to choose from. Are there any obvious examples you know?
Operations that you think of being associative are not. A simple example is adding small and large numbers together. If you add the small numbers together and then the large one (e.g. sum from smallest to largest) the small parts are better represented in the sum than sum from largest to smallest. Could happen if you have a series of small interest payments and are applying it to the starting principal.
I've worked with large datasets, aggregating millions of numbers, summing, dividing, averaging, etc... and I have tested the orders of operations, trying to force some accumulated error, and I've actually never been able to show any difference in the realm of 6-8 significant digits I looked at.
Here's a common problem that shows up in implementations of games:
class Time {
uint32 m_CycleCount;
float m_CyclesPerSec;
float m_Time;
public:
Time() {
m_CyclesPerSec = CPU_GetCyclesPerSec();
m_CycleCount = CPU_GetCurCycleCount();
m_Time = 0.0f;
}
float GetTime() { return m_Time; }
void Update() {
// note that this is expected to wrap
// during the lifetime of the game --
// modular math works correctly in that case
// as long as Update() is called at least once
// every 2^32-1 cycles.
uint32 curCycleCount = CPU_GetCurCycleCount();
float dt = (m_CycleCount - curCycleCount) / m_CyclesPerSec;
m_CycleCount = curCycleCount;
m_Time += dt;
}
};
void GAME_MainLoop() {
Timer t;
while( !GAME_HasQuit() ) {
t.Update();
GAME_step( t.GetTime() );
}
}
The problem is that m_Time will become large relative to dt, the longer the game is running. Worse, as your CPU/GPU gets faster and the game's framerate rises, dt becomes smaller. So something that looks completely fine during development (where m_Time stays small and dt is large due to debug builds) turns into a literal time bomb as users play and upgrade their hardware.
At 300fps, time will literally stop advancing after the game has been running for around 8 hours, and in-game things that depend on framerate can become noticably jittery well before then.
If I'm going to use a 64 bit type for time I'd probably just use int64 microseconds, have over 250,000 years of uptime before overflowing, and not have to worry about the precision changing the longer the game is active.
So using fixed point. You could do that, but you can't make every single time-using variable fixed point without a lot of unnecessary work. Without sufficient care you end up with less precision than floating point. If you don't want to spend a ton of upfront time on carefully optimizing every variable just to avoid wasting 10 exponent bits, default to double.
> in the realm of 6-8 significant digits I looked at
That is far inside the range of 64bit double precision. For error to propagate up to that range of significance depends on the math, but i doubt the aggregation you are describing would cause it... provided nothing silly happens to subtotals like intermediate rounding to precision (you'd be surprised).
Something like compounding as the parent was describing are far more prone to significant error propagation.
I've seen rounding errors on something as simple as adding the taxes calculated per line item vs calculating a tax on a subtotal. This was flagged as incorrect downstream where taxes were calculated the other way.
In a real-life transaction where pennies are not exchanged this could mean a difference of a nickel on a $20 purchase which isn't a meaningful difference but certainly not insignificant.
How much was the difference? Was there any rounding involved at any step? When dealing with money, I see rounding and integer math all the time. As another comment has mentioned, within 53 bits of mantissa, the number range is so big, we are talking 16 digits. I''d be curious to see a real-world example where the float math is the source of error, as opposed to some other bug.
It doesn't take much imagination to construct an example. Knowing 0.1 isn't exactly represented make a formula with it that should be exactly a half cent. Depending on fp I precision it will be slightly above or below a half cent and rounding will not work as expected. We found this in prod at a rate of hundreds to thousands of times per day it only takes volume to surface unlikely errors.
The person I replied to claims to have looked at large* data sets and never seen a discrepancy in 6-8 significant digits. I thought I'd show them a small data set with 3 samples that retains no significant digits.
* Never mind that "millions" isn't large by current standards...
But you are observing all the digits, not just 6-8. It's implicit in the semantics of the operation, and that's something everyone who works with floating point should know.
You're making the same mistake but now it's less obvious because the change of scale. When you compare floating point numbers, simple == is not usually what you want; you need to compare them with a tolerance. Choosing the tolerance can be difficult, but in general when working with small numbers you need a small value and with large numbers you need a large value. This dataset involves datapoint(s) at 1e20; at that magnitude, whatever you're measuring, the error in your measurements is going to be way more than 1, so a choice of tolerance ≤ 1 is a mistake.
Ugh, you're preaching to the choir. I wasn't trying to make a point about the equality operator, I was trying to make a point about x and y being completely different. I must be really bad at communicating with people.
That construction can turn any residual N digits out into a first digit difference. It wouldn't matter without making comparisons with tolerance way under the noise floor of the dataset. But yes, you have technically invented a situation that differs from that real-world anecdote in regard to that property, in an extremely literal interpretation.
And here I was, worried I might be tedious or pedantic, trying to argue that floating point is just not that simple. You've really outdone me in that regard.
JavaScript is precise for 2^53 range. It's unlikely that you're operating with numbers outside of that range if you're dealing with real life things, so for most practical purposes doubles are enough.
> The right answer is 0.0, and the most it can be wrong is 0.2. It's nearly as wrong as possible.
Just to clarify for others, you're implicitly contriving that to mean: you care about the error being positive. The numerical error in 0.1 % 0.2 is actually fairly ordinarily tiny (on the order of x10^-17), but using modulo may create sensitivity to these tiny errors by introducing discontinuity where it matters.
I mistakenly used 0.1 instead of 1.0 but the _numerical_ error is still x10-17, the modulo is further introducing a discontinuity that creates sensitivity to that tiny numerical error, whether that is a problem depends on what you are doing with the result... 0.19999999999999996 is very close to 0 as far as modulo is concerned.
I'm not arguing against you just clarifying the difference between propagation of error into significant numerical error through something like compounding; and being sensitive to very tiny errors by to depending on discontinuities such as those introduced by modulo.
I'm talking about integer numbers. 2^53 = 9007199254740992. You can do any arithmetic operations with any integer number from -9007199254740992 to 9007199254740992 and results will be correct. E.g. 9007199254740991 + 1 = 9007199254740992. But outside of that range there will be errors, e.g. 9007199254740992 + 1 = 9007199254740992
You are are describing only one of the types of numerical error that can occur, and it is not commonly a problem: this is only an edge case that occurs at the significand limit where the exponent alone must be used to approximate larger magnitudes at which point integers become non-contiguous.
The types of errors being discussed by others are all in the realm of non-integer rationals where limitations in either precision or representation introduce error and then compound in operations no matter the order of magnitude... and btw _real_ life tends to contain _real_ numbers, that commonly includes rationals in use of IEEE 754.
Actually this is a source of many issues where a 64bit int say a DB autoinc id can't be exactly represented in a js number. Not a inreallife value but still a practical concern.
I spent a day debugging a problem this created. Without going into irrelevant domain details, we had a set of objects, each of which has numeric properties A and B. The formal specification says objects are categorized in a certain way iff the sum of A and B is less than or equal to 0.3.
The root problem, of course, was that 0.2 + 0.1 <= 0.3 isn't actually true in floating point arithmetic.
It wasn't immediately obvious where the problem was since there were a lot of moving parts in play, and it did not immediately occur to us to doubt the seemingly simple arithmetic.
I can't show you, but my job involves writing a lot of numerical grading code (as in code that grades calculated student answers in a number of different ways). I've had the pleasure of seeing many other systems pretty horrible attempts at this, both from the outside and in, in both cases numerical errors rooted in floating point math abound. To give an easy example, a common numerical property required for building formatters and graders of various aspects (scientific notation, significant figures etc), is the base 10 floored order of magnitude. The most common way of obtaining this is numerically using logarithms, but this has a number of unavoidable edge cases where it fails due to floating point error - resulting in myriad of grading errors and incorrect formatting by one sf.
These are an easy target to find issues that _matter_ because users are essentially fuzzing the inputs, so they are bound to find an error if it exists, and they will also care when they do!
When these oversights become a problem is very context sensitive. I suppose mine is quite biased.
Yes, it's true the vast majority of the time it doesn't actually matter. However decimal formatting does such a good job at giving us the impression that these errors are merely edge-cases, and calculators automatically formatting to 10sf etc further that illusion. If people are not aware it's only an illusion (or just how far the rabbit hole goes) it can be dangerous when they go on to create or use things where that fact matters.