> There's no catastrophic cancellation or anything. And floating point has much ...

gjm11 · on Oct 20, 2021

Hm, interesting. I also tried some numerical experiments to sanity-check and I got reasonable results right up to 1ulp away from the interpolation points. I tried again using exactly your example -- f(x) = 10+10x, interpolation points at 0 and 1 -- and again everything seemed fine.

I wonder what we're doing differently, or whether we happened to test at different points. Here's my code:

    def interpolate1(f,x,xs,ws):
        p,s = 1,0
        for (xj,wj) in zip(xs,ws):
            p *= x-xj
            s += wj*f(xj)/(x-xj)
        return p*s

(it's "interpolate1" because I also implemented the other version of the formula described in the article, which for this purpose behaves very similarly).

For your example we want f(x) = 10+10x, xs=[0,1], ws=[-1,1].

As I mentioned before, if you implement the calculation of the polynomial called l(x) in the paper -- the thing my code calls p -- in an unfortunate way, you can get cancellation problems, but for this case I'm not sure I see any way for that to happen.

I tried calculating l(x) in a different way that could conceivably cause cancellation issues, aimed at this particular interpolant:

    def foo(x):
        p = x*x-x
        s = -1*10/x + 1*20/(x-1)
        return p*s

but this also gave reasonable results for every value of x I tried other than exactly 0 and exactly 1.

What exact values of x are giving you what bad results? What language are you implementing this in, with what data types?*

jlokier · on Oct 20, 2021

  #include <stdio.h>
  #include <math.h>

  double interp(double x, double x1, double y1, double x2, double y2)
  {
      double node_poly = (x - x1) * (x - x2);
      double w1 = 1.0 / (x1 - x2);
      double w2 = 1.0 / (x2 - x1);
      double y = node_poly * ((w1 * y1) / (x - x1) + (w2 * y2) / (x - x2));
      return y;
  }

  int main()
  {
      for (int p = -300; p >= -330; p -= 1) {
          double x = pow(10, p);
          double y = interp(x, 0, 10, 1, 20);
          printf("x = %.6g, y = %.6g\n", x, y);
      }
      return 0;
  }

jacobolus · on Oct 20, 2021

You implemented the wrong formula. You don’t want to explicitly calculate "node poly".

The formula you want is the “second barycentric formula”, (4.2) at the top of page 505 of https://people.maths.ox.ac.uk/trefethen/barycentric.pdf

Or formula (2.6) halfway down page 2 of https://people.maths.ox.ac.uk/trefethen/publication/PDF/2011...

Also cf. Higham (2004) “The numerical stability of barycentric Lagrange interpolation” https://www.maths.manchester.ac.uk/~higham/narep/narep440.pd...

If you want to implement the “first barycentric formula” instead you need to take some additional care to avoid overflow/underflow.

As the papers above note, the choice of interpolation nodes makes a big difference. If you have an arbitrary function to approximate on an arbitrary complex domain, take a look at https://arxiv.org/abs/1612.00337

gjm11 · on Oct 20, 2021

There are advantages to using either version of the formula. If you use the "first" then you do need to take care about overflow/underflow when the number of interpolation points is large, but that isn't an issue here. The "first" behaves better than the "second" for extrapolation, as discussed in the paper by Trefethen et al. And yes, the choice of interpolation points is important.

But none of that should make a difference here. What's actually causing the dubious results jlokier reports isn't exactly proximity to the interpolation points -- I don't think it could happen if one of them weren't exactly zero. What's happening is that the very smallest values of x in jlokier's code are denormal numbers; that is, ones whose floating-point representation has less precision than usual.

Usually a floating-point number looks like 1.<stuff> x 2^n; for IEEE double precision, the <stuff> is 52 bits long. So if you start with 1, 1/2, 1/4, 1/8, and keep dividing by 2, you always have stuff=0 and n decreases by 1 each time. But what should you do once you reach the smallest value of n that will fit in the floating-point format? One answer is just to say that after that you get zero. But you can get a little extra dynamic range if instead you say that when n is as small as it's allowed to be, you can use 0.<stuff> instead of 1.<stuff>. These numbers have reduced precision for the sake of extra dynamic range, and they are called "denormal". Lots of things go subtly (or not so subtly) wrong when dealing with denormal numbers.

One thing that goes wrong is the interpolation formula we're discussing here. The formula looks like f(x) ~= l(x) . sum of wj f(xj) / (x-xj) where xj are the interpolation points (in this case 0 and 1), l(x) is the product of (x-xj), so in this case x(x-1), and wj = product (xj-xi) where xi ranges over all the other interpolation points besides xj, so w0 = -1 and w1 = +1. So we have f(x) ~= x(x-1) . [-10/x + 20/(x-1)]. In fact this is exact provided x is neither 0 or 1, because the thing we're approximating is a polynomial of degree <= (# interpolation points) - 1.

So, what I claimed before was: suppose x = xj + h where h is small, and suppose you compute l(x) as the product of its factors. Then we get l(x) = h times product (xj+h-xi), and the sum is wj f(xj) / h + the sum of the other terms; and if h is tiny then the other terms in the sum are tiny in comparison, and the factors of h and 1/h cancel and everything's fine.

But that doesn't happen when xj=0 and x (and hence h) is denormal -- because you can't pull the denormal trick at the large end of the range, so the reciprocal of a denormal floating-point number is floating-point infinity!

This isn't a cancellation problem. It isn't a matter of the formula going unstable when x is very close to one of the interpolation points. But it is a problem, and it never occurred to me until jlokier demonstrated it, because denormals are kinda obscure and I didn't think to think about what might happen if they get involved. It would be avoided if you made your interpolator check for values very close to one of the xj. But -- I think, but this stuff is treacherous -- it would also be avoided if you checked for infinities as well as NaNs on the output side, which would be more efficient (provided your inputs are usually sensible). Or -- I think, but this stuff is treacherous -- you could just check explicitly for denormal values of x and maybe of the xj, because if none of those is denormal this pathology can't arise.

jlokier · on Oct 20, 2021

Let's say we switch to the "second barycentric formula":

  double interp(double x, double x1, double y1, double x2, double y2)
  {
      double w1 = 1.0 / (x1 - x2);
      double w2 = 1.0 / (x2 - x1);
      double num = (w1 * y1) / (x - x1) + (w2 * y2) / (x - x2);
      double denom = w1 / (x - x1) + w2 / (x - x2);
      return num / denom;
  }

This still gives incorrect +Inf results in an interval around the first interpolation point.

The interval is small for parameters interp(x, 0, 10, 1, 20). But interp(x, 0, 1e50, 1, 2e50) shows the interval can be much larger, as it depends on the y values.

Both formulas return incorrect results in an input range larger than the denormals (1e50 times larger in the second example above). Plenty of "normal" inputs that can arise from other calculations. And these parameters are not unusual either.

So you can't ensure a correct result from either barycentric formula in floating point by just checking for denormal input.

I think you may be right that checking for +/-Inf result as well as NaN may be an adequate test, but I wouldn't assume it without a proof or some paper demonstrating that for sure.

gjm11 · on Oct 20, 2021

Nice catch on using enormous values of y, where indeed it isn't just denormal values of x that cause trouble. Still not a catastrophic-cancellation problem, though. Let's look more closely. Suppose x is very very close to 0. (In particular, close enough that x-1 is indistinguishable from -1. This is not a source of trouble in this case.)

The weights w1,w2 are -1 and +1, as before.

The numerator is -10^50/x - 10^250. The denominator is -1/x - 1. If x is large enough (which it is in the cases that give spurious infinities) the first term in each of these is enough larger that adding the second doesn't change it; so what you're really computing is (-10^50/x) / (-1/x). And if x is small enough that 10^50/x is infinite, then boom.

Something similar could happen even without numbers as large as 10^50. If the value at 0 were 3 instead, then values of x bigger than 1/3 the largest representable non-infinity would produce the same problem. Using f(0)=10^50 enlarges the range in which this particular overflow happens by a factor of 10^50, though.

The fact that this isn't an exotic catastrophic-cancellation problem doesn't, of course, detract from your original point (with which, for the avoidance of doubt, I agree): a calculation that can blow up for certain exact values is likely to misbehave for values that are merely close to those, so testing for the exact values is likely to be a very bad idea.

If I were actually writing something that does interpolation in this way, I would certainly not want to trust that checking the results for infs and nans was sufficient (without, as you say, either proving it or finding someone else who had -- but I suspect it's actually only almost true).

jacobolus · on Oct 20, 2021

If x is very small (<1e-100 or something) you can alternatively evaluate a linear approximation accurate near 0. If y is going to be enormous (>1e+100 or something) you should rescale it.

jlokier · on Oct 20, 2021

Thanks for the pointer, and to the paper showing how to calculate error bounds. Interesting paper.