I found the following note for -ffinite-math-only and -fno-signed-zeros quite worrying:
The program may behave in strange ways (such as not evaluating either the true or false part of an if-statement) if calculations produce Inf, NaN, or -0.0 when these flags are used.
I always thought that -ffast-math was telling the compiler. "I do not care about floating point standards compliance, and I do not rely on it. So optimize things that break the standard".
But instead it seems like this also implies a promise to the compiler. A promise that you will not produce Inf, NaN, or -0.0.
Whilst especially Inf and NaN can be hard to exclude.
This changes the flag from saying "don't care about standards, make it fast" to "I hereby guarantee this code meets a stricter standard" where also it becomes quite hard to actually deduce you will meet this standard. Especially if you want to actually keep your performance. Because if you need to start putting all divisions in if statements to prevent getting Inf or NaN, that is a huge performance penalty.
The whole rationale for the standard for floating-point operations is to specify the FP operations with such properties that a naive programmer will be able to write programs which will behave as expected.
If you choose any option that is not compliant with the standard, that means, exactly as you have noticed, that you claim that you are an expert in FP computations and you know how to write FP programs that will give the desired results even with FP operations that can behave in strange ways.
So yes, that means that you become responsible to either guarantee that erroneous results do not matter or that you will take care to always check the ranges of input operands, as "okl" has already posted, to ensure that no overflows, underflows or undefined operations will happen.
> So yes, that means that you become responsible to either guarantee that erroneous results do not matter or that you will take care to always check the ranges of input operands, as "okl" has already posted, to ensure that no overflows, underflows or undefined operations will happen.
This is why I dislike the fact that it removes simple methods for checking if values are NaN, for example.
I do not find it ergonomic to disable these checks.
If NaN and Inf values can still happen, but you don't have reliable ways of working with them, that is poor.
I think what people really want is a sane floating point mode that throws exceptions instead of these Inf and NaN things, so if their program isn't debugged, they get a clear signal by way of its abnormal termination.
Trapping on math exceptions or otherwise pulling an emergency break doesn't really fit with modern hardware design if you want performance (for one thing, what do you do with the rest of your vector? do you just live with a pipeline flush?).
Adding targeted checks used sparingly can work, but probably requires a deeper understanding of the underlying numerical analysis of your problem. Generally just checking for NaN or Inf at the end is a better solution as these are in most cases absorbing states.
The only obvious way to test if some operation is going to produce some value(s) is to perform that operation. Yes, for, say, a single multiplication I can anticipate the necessary bounds for each input value. But even short sequences of operations would quickly devolve into labyrinthine nested switch statements.
Generally if you're seeing a NaN/Inf something has gone wrong, It's very difficult to gracefully recover from and if you tried I think you would lose both sanity and performance!
Regarding performance, the cost of a real division is about 3-4 orders worse performance than an if statement that is very consistent, but the usual way is to have fast/safe versions of functions, where you need performance and can deduce if something is always/never true, create/use the fast function but by default everything uses the slower/safer function.
An example use of Inf is bounds on a value. Say you want to express that x sometimes has upper and/or lower bounds. With Inf you can simply write
l <= x <= u
accepting that l is sometimes -Inf and u is sometimes +Inf. Without Inf, you get four different cases to handle. This is particularly handy when operations get slightly more complex, like transforming
l' <= x + b <= u'
into the canonical form above, for some finite b. With Inf you can simply write
l = l' - b
u = u' - b
and things will work as one expects. Again, without Inf, multiple cases to handle correctly.
As a scientist, it depends on what you mean by wrong. It's nice to push an array of numbers through some array operation. If some of the outputs look like NaN or Inf then that tells me something went wrong in the operation and to take a closer look. If some optimizer was told that NaN or Inf couldn't happen, meaning that they wouldn't be generated, then some of this output could be pure nonsense and I wouldn't notice. As NaNs propagate through the calculation they're extremely helpful to indicate something went wrong somewhere in the call chain.
NaN is also very useful itself as a data missing value in an array. If you have some regularly sampled data, with some items missing, it makes a lot of things easier to populate the holes with values which guarantee they won't produce misleading output.
Calculating the average value for some observation in a time bucket, where some buckets may have no observation (resulting in 0/0=nan), finally taking some kind of summary of these ignoring nan values (there is a whole library of functions for this in numpy for example).
NaN and Inf are extremely valuable and useful encodings to represent real things. Particularly signed infinity is wonderful, as functions like exp behave correctly for inf (eg. exp(-inf)=0).
>Calculating the average value for some observation in a time bucket, where some buckets may have no observation (resulting in 0/0=nan)
But there are many ways such a calculation could result in NaN besides a division. For example, intermediate sums could result in something like inf - inf. I find it rare that you can definitely conclude that a NaN means some specific thing.
JavaScript's parseInt and parseFloat return NaN on something as benign as "well, that wasn't a number".
var jsonish = "...";
var number = parseFloat(jsonish);
if (!isNaN(number)) {
// deserialized a number
} else {
// ...try deserializing as a boolean or string next...
}
Fast math optimizations can break code like this by breaking isNaN.
I was porting a C++ project to a certain platform - and that platform enabled a -ffast-math equivalent by default in Release (but not Debug) builds! This broke duktape, a JS engine said project embedded, in some nasty and subtle ways. Instead of storing a number/pointer/??? (8 bytes) + type tag (4? bytes) for each dynamically typed JS value, duktape can bit-pack values into a single 8 byte "double" value by storing object/string handles as NaN values - this isn't an uncommon trick for dynamically typed scripting stuff:
Naturally, the -ffast-math equivalent broke isNaN checks, which caused random object/string handles to be mistakenly reinterpreted as "numbers" - but only in Release builds, for this one particular platform, in one rarely taken branch, so neither QA nor CI caught it, leading to hours of manufacturing a repro case, stepping through an absurd amount of code, and then finally looking at the default build rules and facepalming.
Cursing the platform vendor under my breath, I overrode the defaults to align with the defaults of every other config x platform combination we already had: no fast math. If you want those optimizations, use SSE-friendly NaN-avoiding intrinsics - or, if you must use the compiler flags, ensure you do so consistently across build configs and platforms, perhaps limited to a few TUs or modules if possible. This allows you to have a chance at using your Debug builds to debug the resulting "optimizations".
There are algorithms that behave correctly but require intermediate Inf or NaN to work. If I recall there is one related to finding bounding box intersections in ray tracing. I know when I introduce such code into a codebase I put a large "DO NOT CHANGE THIS..." style comment with an explanation so some enterprising programmer doesn't put some if statements in to remove what they think is an error, when it is not.
I'd suspect the same is true for NaN. These things were introduced for reasons. If you don't have a deep knowledge of why, don't assume everyone else is the same.
Edit: I found examples using a NaN and Inf intermediates to get a large speed increase [1,2].
Sure, of course they have uses otherwise why would they exist. But I would argue that the examples you have aren’t the common case — most people’s floating point calculations do not rely on these special values and seeing them denotes something went wrong.
Of course they're not the common case, but the parent wanted examples where they are used in calculations that do have nice answers.
So they are useful. They are more useful than most people realize that simply assume they only mark errors.
And without them in your case, you would not be alerted that a calculation went wrong. So even there they are very useful. Not using them correctly is like ignoring file API errors in your programs - sure they are rare, but you need to understand and handle them if you want to make good software.
This depends how you define “gone wrong”. In evaluating rational functions (a useful tool for approximating all sorts of other functions), one efficient and well behaved algorithm is the “barycentric formula”, based on interpolating function values at various specific input points. When the input is one of those points directly, this formula results in Inf / Inf = NaN. Evaluation code needs to check for this case, and then replace the output by the appropriate value for the input.
±Inf is a very natural output for many kinds of numerical codes.
What is the rationale for liking rational functions?
Years ago i had some interpolation to do, and used the barycentric rational from Boost [1]. The resulting curve was all over the place. A natural cubic spline gave a much more subjectively sensible-looking fit.
You should never test for equality numbers which are known only approximately, which is the case for most FP numbers.
There are nonetheless cases when certain FP values are known exactly and it is OK to test them for equality.
In general the rule does not depend on number representation, it applies equally to floating-point numbers, fixed-point numbers, rational numbers and even large integers in some cases.
What counts is whether a number is known exactly or only approximately.
But the type of barycentric interpolation formula that leads to Inf/Inf at an interpolation point isn't one of those exceptions - it's an example which shows why the rule should be borne in mind.
When the input is very close to one of the interpolation points, the calculation becomes ill-conditioned, unsurprisingly as two values are diverging towards Inf, so the result can become incorrect as the input approaches the interpolation point, until it becomes Inf/Inf = Nan at some even closer, but non-zero, distance. Before reaching Nan it can also have an interval where the result is +/-Inf.
It depends on the interpolation points. But under some circumstances, it is necessary to recognise when the input is close to an interpolation point, and adjust the formula appropriately.
The appearance of NaNs doesn't result from breaking the "never test for equality" rule. It happens when the point where you're evaluating the function just happens to exactly match one of the interpolation points.
But if you decide to deal with the NaN issue by (1) testing whether your input exactly matches any of the interpolation points, or (2) testing the result for NaN, then you're testing for equality or doing something very similar, and then you may have one of the problems the "never test for equality" is meant to prevent: if bad stuff happens when x exactly equals one of the x_j, then perhaps less-obvious bad stuff happens when x almost exactly equals one of the x_j, and an equality test won't catch that.
So, does it? Actually, I think not. Suppose x = xj+h with h very small. Then your interpolant is (h times something of normal size) times (1/h times something of normal size + smaller terms) and although one of those is very small and the other very large, ordinary floating-point arithmetic doesn't care about that. There's no catastrophic cancellation or anything. And floating point has much more dynamic range than precision, so to speak, so until h literally reaches zero you aren't running into trouble. (I guess there might be ways of organizing the calculation that do give you catastrophic cancellation, but if you do it the obvious way then that doesn't happen.)
So in this particular case you really could just test for equality or test for NaNs. I think.
(For some interesting investigation of other aspects of the numerical stability of this kind of interpolation, see https://people.maths.ox.ac.uk/trefethen/publication/PDF/2011.... But note that the authors of this paper are using the formula for extrapolation or, as they prefer to call it, "numerical analytic continuation".)
> perhaps less-obvious bad stuff happens when x almost exactly equals one of the x_j, and an equality test won't catch that.
So, does it? Actually, I think not.
You are correct, it does not. Quite the opposite: the results are excellent when x doesn’t quite equal one of the interpolation nodes, because you end up with a value of S · y / S, where S = «very big but not close to overflowing float» and y = «expected output value».
The only potentially problematic case is if S · y overflows but S does not. But floating point numbers have a ton of dynamic range, so unless you are looking at points really, really close to zero, with very large outputs (say, x − x_j = 1e-200, y = 1e+200), there is no problem. And the failure case there is that you end up with Inf instead of 1e200 as an output.
It isn’t a problem in practice, and if you are worried about it as a problem in theory, you can take care in choice of interpolation nodes or do some scaling of the output values.
> There's no catastrophic cancellation or anything. And floating point has much more dynamic range than precision, so to speak, so until h literally reaches zero you aren't running into trouble.
I just tried it to be sure, in C with "double", and organized the calculation the way it's described in the paper you cited, formula 2.1.
It failed due to catastrophic rational cancellation close to an interpolation point, depending on the values involved. An equality test was not enough.
Approaching the first interpolation point, before it transitioned from the correct result, there were some incorrect Inf results prior to it switching to Nan for an input not equal to any interpolation point.
This was trivial 1d interpolation between two points, where the points were (0.0 => 10.0, 1.0 => 20.0).
Hm, interesting. I also tried some numerical experiments to sanity-check and I got reasonable results right up to 1ulp away from the interpolation points. I tried again using exactly your example -- f(x) = 10+10x, interpolation points at 0 and 1 -- and again everything seemed fine.
I wonder what we're doing differently, or whether we happened to test at different points. Here's my code:
def interpolate1(f,x,xs,ws):
p,s = 1,0
for (xj,wj) in zip(xs,ws):
p *= x-xj
s += wj*f(xj)/(x-xj)
return p*s
(it's "interpolate1" because I also implemented the other version of the formula described in the article, which for this purpose behaves very similarly).
For your example we want f(x) = 10+10x, xs=[0,1], ws=[-1,1].
As I mentioned before, if you implement the calculation of the polynomial called l(x) in the paper -- the thing my code calls p -- in an unfortunate way, you can get cancellation problems, but for this case I'm not sure I see any way for that to happen.
I tried calculating l(x) in a different way that could conceivably cause cancellation issues, aimed at this particular interpolant:
def foo(x):
p = x*x-x
s = -1*10/x + 1*20/(x-1)
return p*s
but this also gave reasonable results for every value of x I tried other than exactly 0 and exactly 1.
What exact values of x are giving you what bad results? What language are you implementing this in, with what data types?*
If you want to implement the “first barycentric formula” instead you need to take some additional care to avoid overflow/underflow.
As the papers above note, the choice of interpolation nodes makes a big difference. If you have an arbitrary function to approximate on an arbitrary complex domain, take a look at https://arxiv.org/abs/1612.00337
There are advantages to using either version of the formula. If you use the "first" then you do need to take care about overflow/underflow when the number of interpolation points is large, but that isn't an issue here. The "first" behaves better than the "second" for extrapolation, as discussed in the paper by Trefethen et al. And yes, the choice of interpolation points is important.
But none of that should make a difference here. What's actually causing the dubious results jlokier reports isn't exactly proximity to the interpolation points -- I don't think it could happen if one of them weren't exactly zero. What's happening is that the very smallest values of x in jlokier's code are denormal numbers; that is, ones whose floating-point representation has less precision than usual.
Usually a floating-point number looks like 1.<stuff> x 2^n; for IEEE double precision, the <stuff> is 52 bits long. So if you start with 1, 1/2, 1/4, 1/8, and keep dividing by 2, you always have stuff=0 and n decreases by 1 each time. But what should you do once you reach the smallest value of n that will fit in the floating-point format? One answer is just to say that after that you get zero. But you can get a little extra dynamic range if instead you say that when n is as small as it's allowed to be, you can use 0.<stuff> instead of 1.<stuff>. These numbers have reduced precision for the sake of extra dynamic range, and they are called "denormal". Lots of things go subtly (or not so subtly) wrong when dealing with denormal numbers.
One thing that goes wrong is the interpolation formula we're discussing here. The formula looks like f(x) ~= l(x) . sum of wj f(xj) / (x-xj) where xj are the interpolation points (in this case 0 and 1), l(x) is the product of (x-xj), so in this case x(x-1), and wj = product (xj-xi) where xi ranges over all the other interpolation points besides xj, so w0 = -1 and w1 = +1. So we have f(x) ~= x(x-1) . [-10/x + 20/(x-1)]. In fact this is exact provided x is neither 0 or 1, because the thing we're approximating is a polynomial of degree <= (# interpolation points) - 1.
So, what I claimed before was: suppose x = xj + h where h is small, and suppose you compute l(x) as the product of its factors. Then we get l(x) = h times product (xj+h-xi), and the sum is wj f(xj) / h + the sum of the other terms; and if h is tiny then the other terms in the sum are tiny in comparison, and the factors of h and 1/h cancel and everything's fine.
But that doesn't happen when xj=0 and x (and hence h) is denormal -- because you can't pull the denormal trick at the large end of the range, so the reciprocal of a denormal floating-point number is floating-point infinity!
This isn't a cancellation problem. It isn't a matter of the formula going unstable when x is very close to one of the interpolation points. But it is a problem, and it never occurred to me until jlokier demonstrated it, because denormals are kinda obscure and I didn't think to think about what might happen if they get involved. It would be avoided if you made your interpolator check for values very close to one of the xj. But -- I think, but this stuff is treacherous -- it would also be avoided if you checked for infinities as well as NaNs on the output side, which would be more efficient (provided your inputs are usually sensible). Or -- I think, but this stuff is treacherous -- you could just check explicitly for denormal values of x and maybe of the xj, because if none of those is denormal this pathology can't arise.
This still gives incorrect +Inf results in an interval around the first interpolation point.
The interval is small for parameters interp(x, 0, 10, 1, 20). But interp(x, 0, 1e50, 1, 2e50) shows the interval can be much larger, as it depends on the y values.
Both formulas return incorrect results in an input range larger than the denormals (1e50 times larger in the second example above). Plenty of "normal" inputs that can arise from other calculations. And these parameters are not unusual either.
So you can't ensure a correct result from either barycentric formula in floating point by just checking for denormal input.
I think you may be right that checking for +/-Inf result as well as NaN may be an adequate test, but I wouldn't assume it without a proof or some paper demonstrating that for sure.
Nice catch on using enormous values of y, where indeed it isn't just denormal values of x that cause trouble. Still not a catastrophic-cancellation problem, though. Let's look more closely. Suppose x is very very close to 0. (In particular, close enough that x-1 is indistinguishable from -1. This is not a source of trouble in this case.)
The weights w1,w2 are -1 and +1, as before.
The numerator is -10^50/x - 10^250. The denominator is -1/x - 1. If x is large enough (which it is in the cases that give spurious infinities) the first term in each of these is enough larger that adding the second doesn't change it; so what you're really computing is (-10^50/x) / (-1/x). And if x is small enough that 10^50/x is infinite, then boom.
Something similar could happen even without numbers as large as 10^50. If the value at 0 were 3 instead, then values of x bigger than 1/3 the largest representable non-infinity would produce the same problem. Using f(0)=10^50 enlarges the range in which this particular overflow happens by a factor of 10^50, though.
The fact that this isn't an exotic catastrophic-cancellation problem doesn't, of course, detract from your original point (with which, for the avoidance of doubt, I agree): a calculation that can blow up for certain exact values is likely to misbehave for values that are merely close to those, so testing for the exact values is likely to be a very bad idea.
If I were actually writing something that does interpolation in this way, I would certainly not want to trust that checking the results for infs and nans was sufficient (without, as you say, either proving it or finding someone else who had -- but I suspect it's actually only almost true).
If x is very small (<1e-100 or something) you can alternatively evaluate a linear approximation accurate near 0. If y is going to be enormous (>1e+100 or something) you should rescale it.
Because you're about 15 stack frames deep into some calculation when the problem gets noticed. So how do you handle this? You could throw an exception (or similar), which would allow the UI to state something semi-helpful like 'Calculation failed'. But recover? Usually not.
I think NaN and inf can be really useful in specific cases, but they probably shouldn't be allowed to propagate through your program, and shouldn't occur unless you're specifically making use of them.
When you do not want such values to be propagated, it is always possible to choose to have overflow exceptions instead of infinities and undefined operation exceptions instead of NaNs. Also underflow exceptions instead of denormals.
The programmer should choose which is most appropriate for an application. Choosing to both ignore the exceptions and not generate a special value that can be examined later, that is seldom acceptable.
I often use NaNs on purpose to designate “unknown”, and its properties of “infecting” computations mean that I don’t have to check every step - just at the end of the computation.
R goes even farther, and uses one specific NaN (out of the million-billions) to signal “not available”, while other NaNs are just NaNs.
Its properties, used properly, make code much more simple and sane.
NaN usually indicates a bug or algorithmic problem, but not always. I have had cases where I need to detect NaN, present a useful message to the user even though everything is working as designed.
At least GCC is I believe fairly buggy in that regard since isnan is optimised out, thus you cannot check if say an incoming value read from the network is a NaN. There's some lengthy discussion about it on the big tracker iirc..
I found this out the hard way several years ago and still have my own implementations of these functions laying about in case I want to use -ffast-math.
But couldn’t the compiler optimize out the check because you’re guaranteeing to never divide by 0, so logically count would never be 0. Akin to undefined behavior almost.
I probably am confusing them. I don’t program in C or C++, so I’m not aware of all the “gotchas”, and can easily confuse them (as shown here). Not to mention that compilers have been getting more aggressive in optimizing undefined behavior.
or equivalent options, which eliminate all the horror stories with unexpected behavior caused by integer divide-by-zero or overflow, pointer overflow, out-of-bounds access and many others.
Because unlike integer operations the FP operations are standardized, there is no need for compiler options, the behavior for exceptions can be controlled from the program with the functions defined in <fenv.h> in C or <cfenv> in C++.
The checks could be expensive though, more expensive than the gains from this optimization. Often it is better to let the calculation proceed and use isnan to check the final result.
That's where more advanced type systems shine. You get to encode things like "this number is not null" and a safe division function will only take "not nulls" as the input.
The workflow would then be:
1. prove that the input is not null (if input == 0) yielding a "not null number"
2. running your calculation (no check needed for multiplication or division, if you add something you have to reprove that the number is not null)
"Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs."
The obvious implication is that is that arguments and results are always finite and as a programmer you are responsible of guaranteeing the correct preconditions.
Certainly do not use finite-math-only when dealing with external data.
let me rephrase: data which you cannot guarantee meet the preconditions. Either the input is known good data or you sanitize it through some code path not compiled with fast-math.
> I always thought that -ffast-math was telling the compiler. "I do not care about floating point standards compliance, ... So optimize things that break the standard". But instead it seems like this also implies a promise to the compiler [that] you will not produce Inf, NaN, or -0.0.
I see it as both, as with many such flags¹n though as you say it does put the onus on the developer to avoid inputs the optimisation may cause problems for. You are saying “I promise that I'm being careful not to do things that will be affected by these parts of the standard, or won't blame you for undefined behaviour if I do, so you can skip those rules if it helps you speed things up”.
[1] like those that enable optimisations which can cause significant breakage if pointer aliasing is happening
That's the way a number of those special flags work. They are a contract to the compiler saying "act like this can't happen, I promise it won't".
The picking and choosing specific flags enable might be better in general. The flag `-funsafe-math-optimizations` seems to be useful as general-purpose, although as this article mentions, some of those optimizations need the more unsafe ones to become active.
I would say that `-ffast-math` is a dangerously safe-sounding flag, `-funsafe-fast-math` would be better IMO.
Not only math optimizations are like this. Strict aliasing is a promise to the compiler to not make two pointers of different types point to the same address.
You technically make that promise by writing in C (except a narrow set of allowed conversions) but most compilers on standard settings do let you get away with it.
I imagine it means that the compiler could turn an if(x<6)-else into two free standing if statement like if(x<6), if(x>=6). Which will logically work under these assumptions.
In Julia you can do `@fastmath ...`, which is basically just a find-and-replace of math operations and does not propagate into functions called in that code block, even when they are ultimately inlined. So what it does is:
julia> @macroexpand @fastmath x + y
:(Base.FastMath.add_fast(x, y))
And maybe that's a good thing, because the scope of @fastmath is as limited as it gets.
Yeah, I really like that Julia and LLVM allow applying it on a per-operation basis.
Because most of LLVM's backends don't allow for the same level of granularity, they do end up propagating some information more than I would like.
For example, marking an operation as fast lets LLVM assume that it does not result in NaNs, letting nan checks get compiled away even though they themselves are not marked fast:
julia> add_isnan(a,b) = isnan(@fastmath(a+b))
add_isnan (generic function with 1 method)
julia> @code_llvm debuginfo=:none add_isnan(1.2,2.3)
define i8 @julia_add_isnan_597(double %0, double %1) #0 {
top:
ret i8 0
}
meaning it remains more dangerous to use than it IMO should be.
For this reason, LoopVectorization.jl does not apply "nonans".
with gcc you can also use #pragma gcc optimize or __attribute__(optimize(...)) for a similar effect.
It is not 100% bug free (at least it didn't use to) and often it prevents inlining a function into another having different optimization levels (so in practice its use has to be coarse grained).
Try it with and without the pragma, and adding -ffast-math to the compiler command line. It seems that with the pragma sqrt(x) * sqrt(x) becomes sqrt(x*x), but with the command line version it is simplified to just x.
That's very interesting. The pragma does indeed do a lot of optimizations compared to no-pragma (for example it doesn't call sqrtf at all), but the last simplification is only done with the global flag set. I wonder if it is a missed optimization or if there is a reason for that.
edit: well with pragma fast-math it appears it is simply assuming finite math and x>0 thus skipping the call to the sqrtf as there are not going to be any errors, basically only saving a jmp. Using pragma finite-math-only and an explicit check are enough to trigger the optimization (but passing finite-math as a command line is not).
Generally it seems that the various math flags behave slightly differently in the pragma/attribute.
Fascinating to learn that something labelled 'unsafe' stands a reasonable chance of making a superior mathematical/arithmetic choice (even if it doesn't match on exactly to the unoptimized FP result).
If you're taking the ratio of sine and cosine, I'll bet that most of the time you're better off with tangent....
Excellent post I agree, but isn’t it more about how they can’t optimize? Compilers can seem magic how they optimize integer math but this is a great explanation why not to rely on the compiler if you want fast floating point code.
• People imagine enabling fast float by "scope", but there's no coherent way to specify that when it can mix with closures (even across crates) and math operators expand to std functions defined in a different scope.
• Type-based float config could work, but any proposal of just "fastfloat32" grows into a HomerMobile of "Float<NaN=false, Inf=Maybe, NegZeroEqualsZero=OnFullMoonOnly, etc.>"
• Rust doesn't want to allow UB in safe code, and LLVM will cry UB if it sees Inf or Nan, but nobody wants compiler inserting div != 0 checks.
• You could wrap existing fast intrinsics in a newtype, except newtypes don't support literals, and <insert user-defined literals bikeshed here>
It must be clearly understood which of the flags are entirely safe and which need to be under 'unsafe', as a prerequisite.
Blanket flags for the whole program do not fit very well with Rust, while point use of these flags is inconvenient or needs new syntax.. but there are discussions about these topics.
Also maybe you woild like to have more granular control over which parts of your program has the priority on speed and which part favours accuracy. Maybe this could be done with a seperate type (e.g. ff64) or a decorator (which would be useful if you want to enable this for someone elses library).
Reasoning about the consequences along a couple functions in your hot path is one thing. Reasoning about the consequences in your entire codebase and all libraries is quite another.
The whole point of ALGOL derived languages for systems programming, is not being fast & loose with anything, unless the seatbelt and helmet are on as well.
I don't know if most Rust programmers would be happy with any fast and loose features making it into the official Rust compiler.
Besides, algorithms that benefit from -ffast-math can be implemented in C and with Rust bindings automatically generated.
This solution isn't exactly "simple", but it could help projects keep track of the expectations of correctness between different algorithm implementations.
Rust should have native ways to express this stuff. You don't want the answer to be "write this part of your problem domain in C because we can't do it in Rust".
Denormal floats are not a purpose, they are not used intentionally.
When the CPU generates denormal floats on underflow, that ensures that underflow does not matter, because the errors remain the same as at any other floating-point operation.
Without denormal floats, underflow must be an exception condition that must be handled somehow by the program, because otherwise the computation errors can be much higher than expected.
Enabling flush-to-zero instead is an optimization of the same kind as ignoring integer overflow.
It can be harmless in a game or graphic application where a human will not care if the displayed image has some errors, but it can be catastrophic in a simulation program that is expected to provide reliable results.
Providing a flush-to-zero option is a lazy solution for CPU designers, because there have been many examples in the past of CPU designs where denormal numbers were handled without penalties for the user (but of course with a slightly higher cost for the manufacturer) so there was no need for a flush-to-zero option.
Flushing denormals to zero only matters if your calculations are already running into the lower end of floating-point exponents (and even with denormals, if they're doing that, they're going to run into lost precision anyway sooner or later).
The useful thing denormals do is make the loss of precision at that point gradual, instead of sudden. But you're still losing precision, and a few orders of magnitude later you're going to wind up at 0 either way.
If your formulas are producing intermediate results with values that run into the lower end of FP range, and it matters that those values retain precision there, then you're either using the wrong FP type or you're using the wrong formulas. Your code is likely already broken, the breakage is just rarer than in flush-to-zero mode.
So just enable FTZ mode, and if you run into issues, you need to fix that code (e.g. don't divide by values that can be too close to 0), not switch denormals on.
Your arguments are correct, but the conclusion does not result from them.
If we assume that underflows happen in your program and this, as you say, is a sign that greater problems will be caused by that, then you must not enable flush-to-zero, but you must enable trap-on-underflow, to see where underflows happen and to investigate the reason and maybe rearrange your formulas to avoid the too small results.
Flush-to-zero may sometimes lead to crashes, when it becomes obvious that something is wrong, but more often you just get some results with large errors that are difficult to distinguish from good results.
Opinions obviously vary, but I have never seen any good use case for flush-to-zero.
Underflows are either so rare that their performance does not matter, or if they are so frequent that flush-to-zero will increase the speed, then your code has a problem and the formulas should be restructured, which will both increase the speed and eliminate the loss of precision.
> Opinions obviously vary, but I have never seen any good use case for flush-to-zero.
The classical use case is real-time audio, where an IIR filter may have quite slow decay, such that the signal stays in the subnormal regime for many samples. If this happens and your hardware has significant stalls for subnormal data, you may miss your real-time deadline resulting in clicking or other audio corruption.
I agree that this is a good example and there are probably similar cases in video processing and in graphics rendering.
However not all CPUs stall for subnormal data and if the CPU designers would have ensured that none of them stall, no such discussions would have been ever needed.
Outside such special cases like real-time DSP and graphics, using flush-to-zero or options like -Ofast or -ffast-math are seldom justified and they can be dangerous, especially when they are used without a prior analysis and tests to determine whether exception conditions really appear when the program is run, and why.
> However not all CPUs stall for subnormal data and if the CPU designers would have ensured that none of them stall, no such discussions would have been ever needed.
I don't really disagree (I've pushed hard for no subnormal stalls for years), but there are some timing and area tradeoffs that come with this that make it a somewhat hard sell for FPUs that aren't built entirely around FMA (there are tradeoffs even for designs that are purely FMA based, but they're a lot smaller).
Mainstream x86 designs _still_ have some subnormal stalls!
> Outside such special cases like real-time DSP and graphics
Personally I see audio and graphics as the common case, and scientific computing as the special case ;) Do game physics engines expect denormals to be preserved?
It is normal to get values close to zero in a correctly implemented floating point algorithm - what matters is that, at that point, you don't rely on their dynamic range or precision. Trap-on-underflow would produce many false positives for algorithms which would be perfectly happy having those numbers be treated as zero.
This is not uncommon - it is completely expected with, say, anything implementing an exponential decay! A slow exponential decay could run into subnormals for large spans of time, yet be fine having those treated as zero.
Thanks for this explanation. But yeah, I meant : are there applications where use of denormals vs. flush-to-zero is actually useful... ? If your variable will be nearing zero, and e.g. you use it as a divider, don't you need to handle a special case anyway ?
Just like you should handle your integer overflows.
I'm seeing denormals as an extension of the floating point range, but with a tradeoff that's not worth it. Maybe I got it wrong ?
The use of denormals is always useful except when you want to make your program faster no matter what and your target CPU happens to be a CPU where operations with denormals are slower than operations with other numbers.
You must keep in mind that if you enable flush-to-zero and you really see an increase in speed, that means that you have now injected errors in your FP computations. Whether the errors matter or not, that depends on the application.
If you do not see any serious speed improvement, then you have no reason to enable flush-to-zero. Even when you might want to use flush-to-zero in a production binary, you should not use it during development.
Prior to enabling flush-to-zero it would be good to enable trap-on-underflow instead of denormals, to see if this really happens frequently enough to matter for performance and possibly to investigate the reason, to see if underflow could be avoided in other ways.
In conclusion using denormals is always useful, because it limits the errors accumulated during FP computation. On some CPUs you may get higher speed by not using denormals, but only if large errors do not matter. If you avoid underflows by other means, e.g. range checking of inputs, then it does not matter how slow the operations with denormals are, so there is no reason to use any option that enables flush-to-zero (e.g. -ffast-math).
Making a FP computation with "double" numbers and enabling flush-to-zero on that would usually be quite stupid, because if the errors do not matter, then that is a task for single-precision "float" (unless you need "double" only for the exponent range, not for precision).
I agree with you; if your code "works" with denormals and "doesn't work" with flush-to-zero, then it's already broken and likely doesn't work for some inputs anyway, you just haven't run into it yet.
If you want maximum possible accuracy for FP computation, you need to pay the performance price that denormals incur. In many cases, that will likely be a long running computation where responsiveness from a user-perspective and/or meeting realtime deadlines is not an issue. Any time "I want the best possible answer that we can compute", you should leave denormals enabled.
By contrast, in media software (audio and/or video) denormals getting flushed to zero is frequently necessary to meet performance goals, and ultimately has no impact on the result. It's quite common for several audio FX (notably reverb) to generate denormals as the audio fades out, but those values are so far below the physical noise floor that it makes no difference to flush them to zero.
It's not that you decide specifically to use them, but sometimes they sort of happen very naturally. For example, if you evaluate a Gaussian function f(x)=e^(-x^2), for moderate values of x the value f(x) is denormal.
The program may behave in strange ways (such as not evaluating either the true or false part of an if-statement) if calculations produce Inf, NaN, or -0.0 when these flags are used.
I always thought that -ffast-math was telling the compiler. "I do not care about floating point standards compliance, and I do not rely on it. So optimize things that break the standard".
But instead it seems like this also implies a promise to the compiler. A promise that you will not produce Inf, NaN, or -0.0. Whilst especially Inf and NaN can be hard to exclude.
This changes the flag from saying "don't care about standards, make it fast" to "I hereby guarantee this code meets a stricter standard" where also it becomes quite hard to actually deduce you will meet this standard. Especially if you want to actually keep your performance. Because if you need to start putting all divisions in if statements to prevent getting Inf or NaN, that is a huge performance penalty.