Hacker News new | past | comments | ask | show | jobs | submit login
A brief introduction to interval arithmetic (buttondown.email/hillelwayne)
221 points by zdw 4 months ago | hide | past | favorite | 95 comments



Interval arithmetic powers this graphing calculator I made.

The user can enter a formula without solving for y, like “y^y = x^x”. I rearrange into “0 = x^x - y^y”. Then I use interval arithmetic to calculate the result interval of “x^x - y^y” for x = the x-axis range of the graph’s view, y = the y-axis range of the graph’s view. If the result interval contains 0 then I have something to draw. I recursively divide the ranges in half and do a binary search until I find very tiny intervals that contain solutions. I draw those as points on the graph.

Example formulas this can handle:

https://memalign.github.io/m/formulagraph/index.html?f1(x,t)...


when i did this i got better efficiency from ternary search

http://canonical.org/~kragen/sw/aspmisc/intervalgraph.html


you may find this interesting if you haven't seen it already: https://fredrikj.net/blog/2017/11/new-rigorous-numerical-int...


If on top of rigorous, you want them to be formally verified in Coq at the same time as they are computed: https://www.lri.fr/~melquion/doc/18-jar.pdf


is this the one mentioned in fredrik's post? he links https://www.lri.fr/~melquion/doc/16-itp-article.pdf which is presumably a different paper by the same author


I think they are the conference and journal versions of the same paper. Hadn't seen it was mentioned in the article, I should have read it more thoroughly!


i hadn't, this is fantastic!


Then you possibly will find this useful: https://en.wikipedia.org/wiki/Golden-section_search


that is an algorithm for a different problem; here we are looking for a zero, not an extremum, and we want to find all the zeroes (in a two-dimensional plane, so there are usually infinitely many zeroes), not just one of them

perhaps there is a way to apply it to this problem that is obvious to you but not to me

with autodiff we can use a zero-finding algorithm (even one that isn't derivative-free) to find extrema, but i don't know how you'd go about using an extremum-finding algorithm to find zeroes. the first step would seem to be quadrature? but that sounds impractical


The algoithm itself is about the use of golden ratio for interval search.

One can use subdivision, other can use ternary division. And another one can use golden ratio.


as i understand it, the advantage of golden-section search is that, to search for a minimum rather than a zero, you need to in some sense interpolate a parabola rather than a line, so you need three points rather than two, and you'd like them to be somewhat evenly spaced. i don't fully understand why the golden section is better for this than just dividing the interval between the two lowest points in half, but it's definitely an algorithm to solve a different problem


This article does a great job at explaining interval arithmetic. However, the introduction says

>Instead of treating each as exactly 7 feet, we can instead say that each is somewhere between a minimum of 6.9 feet and a maximum of 7.1. We can write this as an interval (6.9, 7.1).

Yes we can use an interval to express an uncertainty. However, uncertainties in physical measurements are a little bit more complicated.

When I measure something to be 7 plus minus 0.1 feet, what I am saying is that the value of the measured variable is not known for sure. It can be represented by a bell curve centred on 7 and 95% of the area under the curve (95% probability) that the true value lies between 6.9 and 7.1. The value of the measured variable is much more likely to be 7 than 6.9. There is also a small chance that the value lies outside of the 6.9 to 7.1 range.

In an interval, there is no probability distribution. It is more like an infinite list of numbers.

In practice, interval arithmetic is seldom used for uncertainty analysis for scientific experiments.


To close the loop: The connection is called an alpha-cut.

In the Gaussian case it would cut the normal distribution horizontal at a defined height. The height is defined by the sigma or confidence you want to reflect.

The length of the cut resp. The interval on the support is how you connect propability and intervals.


It's possible to use gaussian variables and use gaussian error propagation, for an implementation see https://gvar.readthedocs.io/en/latest/ which is critical for the lsqfit library https://lsqfit.readthedocs.io/en/latest/

In gvar everything by default is normally distributed, but you can add_distribution, log-normal is provided, for example. You can also specify the covariance matrix between a set of values, which will be correctly propagated.


It's hard for me to understand the goal of this comment. Nothing in it is incorrect. It's also not really a meaningful critique or response to the article. The article did not attempt to describe "uncertainty analysis for scientific experiments". It blatantly began by describing interval arithmetic and ended by justifying it as being meaningful in two contexts: IEEE floating point numbers and machining tolerances. Neither are experimental domains and both do have a meaningful inbuilt notion of interval that would be not be served by treating intervals as gaussians.


Gaussian distributions are a horrible choice for representing measurement uncertainty. If the tool is properly calibrated, 100% of the probability mass will be within (6.9, 7.1). A normal distribution would have probability mass in negative numbers!

There's also no motivation for choosing a normal distribution here - why would we expect the error to be normal?


If the error is the sum of many little errors, as it often is in mechanical assemblies, it's approximately normal due to the central limit theorem.


True, but that’s not how most sensors actually work. For example consider a weighing scale. If it says 10.1kg, why would we use a normal distribution?


What I hear is that similar techniques should/could be used by explicitly modeling it not as an interval (6.9, 7.1) but as a gaussian distribution of 7±0.1, and a computer can do the arithmetic to see what the final distribution is after a set of calculations.


You could use intervals to prove the codomain of a function, given its domain is an interval, using the same arithmetic.

Would actually be useful in programming as proving what outputs a fn can produce for known inputs - rather than use unit tests with fixed numerical values (or random values).


There is no reason to assume a normal distribution. If you have a tool that measures to a precision of 2 decimal places, you have no information about what the distribution of the third decimal place might be.


This is correct, which is why intervals don't choose an interpretation of the region of uncertainty.

If you do have reason to interpret the uncertainty as normally distributed, you can use that interpretation to narrow operations on two intervals based on your acceptable probability of being wrong.

But if the interval might represent, for example, an unknown but systematic bias, then this would be a mistake. You'd want to use other methods to determine that bias if you can, and correct for it.


> There is no reason to assume a normal distribution.

There absolutely is with sane assumptions about how any useful measurement tool works. Gaussian distributions are going to approximate the actual distribution for any tool that's actually useful, with very few exceptions.


Tools, yes. Processes, no.

When fabricating, we'll often aim for the high end of a spec so you have material remaining to make adjustments. Most of our measurements actually follow double-tail or exponential distributions.


I'm sorry but if I give you a measuring tape that goes to 2 decimal places and you measure a piece of wood at 7.23 cm, when you get a more precise tape you have no information at all about what the third decimal place will turn out to be. It could be anywhere between 7.225 and 7.235, there is no expectation that it should be nearer to the centre. All true lengths between those two points will return you the same 7.23 measurement and none are more likely than any other given what you know.


I'm not sure why you are being downvoted - this is absolutely true.


This is a good article!

In ClickHouse, interval arithmetic is applied to index analysis. A sparse index consists of granules, and each granule is an interval of tuples in lexicographic order. This interval is decomposed into a union of hyperrectangles. Conditions such as comparisons, logic operators, and many other functions are evaluated on these hyperrectangles, yielding boolean intervals. Boolean intervals represent ternary logic (always true, always false, can be true or false). Interesting tricks include: applying functions that are monotonic on ranges (for example, the function "day of month" is monotonic as long as the month does not change), calculating function preimages on intervals, and even calculating preimages of n-ary functions, which is useful for space-filling curves, such as Morton or Hilbert curves.

Check for more details: https://github.com/ClickHouse/ClickHouse/blob/master/src/Sto...

Or see examples, such as https://adsb.exposed/


Clockhouse is real smart, I wish it spoke real SQL, and not some parody of SQL like it currently does.


I have a presentation on this topic.

https://presentations.clickhouse.com/meetup107/app/

Starting from slide 19, "How is it possible," explains why we need such a parody.


What’s wrong with its dialect? I use it heaps and haven’t any issues?


The article links to a "Gustafson vs Kahan" debate transcript. The video is more entertaining: https://youtube.com/watch?v=KEAKYDyUua4


This transcript is very interesting, and in particular I didn't know about Unums (Universal Numbers), which look amazing.

Would be interesting to get more bench (speed / memory) comparing float with unum.


I've been aware about the unum stuff for a while, but I've never delved deeply into it, and some of my hot takes on it:

* Gustafson's unum project seems to be invariably pitched in a cult-like this-is-the-one-true-way [1] manner which makes it hard to evaluate dispassionately.

* There seem to be several versions of unums, the latest of which is not-even-a-unum anymore but instead a regular floating-point number with different distribution called 'posits.' That Gustafson seems to have changed the vision so many times suggests to me that early criticisms were in fact correct.

* Conversations I've had with numerical experts about interval arithmetic is that it generally doesn't work [as a replacement for where we use floating-point today]--intervals tend to blow up to infinity, especially as it's difficult to account for correlated error (which is the point of this article here, actually).

A lot of Gustafson's pitch seems to be "you shouldn't need a numerical analyst to write math," which is naturally going to rile up a numerical analyst (like Kahan). But the numerical analyst's retort that you're still liable to get into trouble if you're not savvy enough to know the pitfalls is equally true of Gustafson's proposal; there's no magic bullet that makes problems go away.

From my perspective, the real problem is that we lack the tooling to let programmers discover issues with their numerics code, no matter how they're implemented. Gustafson and Kahan are talking past each other on this problem, with Kahan rightly pointing to all the functionality that IEEE 754 added to enable this functionality, and Gustafson rightly pointing out that those features are unused and largely unusable, and Kahan (probably rightly) pointing out that unums' promise of a magic bullet to numerics comes with issues of its own.

[1] This is possibly meant just as a joke, but it's the kind of unfunny joke that instead makes me wonder about the character of the person who presents it, like how asshole-as-a-character internet personalities turn out to frequently also be assholes in person.


In 1980 I started working with a straightforward algorithm that was computerized then by agreeing on its implementation in a few pages of 32-bit double-precision floating-point Fortran code.

Up until then, aggregate data had been manually compiled and published in kilos of handbooks over the decades.

This was the first acceptable computer approach since it was the exact same pre-1980 algorithm, and expected to play a part in correct 20-digit decimal billable amounts based on computer data which had been meticulously rounded to 4 decimal places, which is what took up most of the Fortran code.

Well, I needed to do the same calculations on an 8-bit Radio Shack Pocket Computer. And there was only 512 bytes of user space for my TRS-80 Basic code.

The exact algorithm would fit, but not any of the standard multi-step rounding procedure. The floating point output was not often good to 4 decimal places.

Massaged it iteratively until the algorithm was no longer fully recognizable. Still no good.

Changed from floats to integers. This also saved more memory for workspace.

I was no mathematician, and in order to get integers to do the whole thing, leaving only the need for a final move of decimal point, it was not easy.

Ended up with a very dissimilar representation of the algorithm, using numbers specifically geared to the problem at hand, nothing universal like Gustafson.

When I read his material I was intrigued that one of the objectives was to obtain more numerical accuracy from lesser-bit computers himself.


>Instead of treating each as exactly 7 feet, we can instead say that each is somewhere between a minimum of 6.9 feet and a maximum of 7.1. We can write this as an interval (6.9, 7.1).

This is not how measurements work in physics. You have to measure at least twice (If you measure only once the sample variance will be infinite due to the bessel correction).

Only then you can compute the mean value of your measurements and the standard error. Assuming you made two measurements 6.9 and 7.1 you would write length = 7.0 +/- 0.1 . (see https://www.omnicalculator.com/statistics/standard-error?v=t... )


There are several ways to handle uncertainties and propagation of error. Interval arithmetic is one of the simpler methods. Some Python packages that focus on this problem:

https://pythonhosted.org/uncertainties/

https://pypi.python.org/pypi/soerp

https://pypi.python.org/pypi/mcerp

This last one is one of my favorite Python libraries. My hunch is that it is highly underrated.

I am still looking for/consider implementing error propagation as probability distributions where these are symbolic.


"Real-time latin-hypercube-sampling-based Monte Carlo Error Propagation" isn't the most "Hey, beginners! Come on in and try this!" description / synopsis. Ditto "Second Order Error Propagation."

The "free, cross-platform program that transparently handles calculations with numbers with uncertainties" module may not be as strong, but definitely sells and describes itself better.

mcerp might be better rated if it hoisted "easily and transparently track the effects of uncertainty through mathematical calculations" promise to its headline.


You seem knowledgeable so maybe you could give me some advice.

I developed a new method using SVD/PCA with some basis rotation method. I can potentially characterize or estimate errors on my input data, but I have no idea how to propagate these errors to my basis vectors. I'm at a bit of a loss as to where to look from here. My only idea is to bootstrap... But that's a bit lame :) and not very rigorous


I am not sure exactly what you mean, but I will try with the following advice.

See if you are able to define each element of your input data as a stochastic variable that you can sample from. Then make a function to generate a set of fixed value input data elements by sampling from the stochastic variables. Then run this many times to generate resulting basis vectors. Calculate relevant measures of variability from these vectors.


I've experimented on a few systems to apply interval arithmetic to type inference, i.e.

   // starRating is an int<0,5>, meaning
   // integer between 0 and 5 inclusive.
   let starRating = console.readInt<0,5>("How many stars do you rate this movie?");
   db.addStarRating(starRating);

   // starRatingList is a list<int<0,5>>
   let starRatingList = db.readStarRatings(movie);

   // totalStarRatings is an int<0,5*starRatingList.length>
   // this type can't be sized at compile time, but we can still check it!
   let totalStarRatings = sum(starRatings);

   // avgStarRating is a very difficult type, because we need to either have
   // a fraction type that's slow, or some handling of rounding/truncating
   // when we do division. So now we're dealing with 3 intervals: min/max,
   // min/max precision, and min/max error.
   let avgStarRating = totalStarRatings / starRatingList.length;


> Why x^2 isn't always x * x

It turns out he's claiming they're different if x^2 is interpreted as squaring each element in the interval x, while x * x is interpreted as a cross product: the interval obtained by multiplying all pairs of elements in the interval. But I haven't ever seen anyone use x^2 to mean pointwise squaring on an interval x. Is that some kind of standard notation?


"Pointwise squaring on an interval x" is just a weird way of describing the usual function f(x) = x^2 with domain restricted to an interval. It's pointwise because that's how functions f : R -> R are defined: given a point, or value, of the domain, give me a new point in the codomain.

If you think of `x` as a whole interval unto itself, and not just a single point, then I think the options become more interesting. The most natural product on two sets is indeed the cross product; but for intervals, I can imagine defining a common parameterization over both intervals and then multiplying pointwise up to that parameterization.


It makes sense if instead of thinking about intervals, you think about the supports of random variables[1]. Given two independent random variables, X is not indepent of itself, so supp(X) = supp(Y) does not imply supp(X * X) = supp(X * Y).

[1]: https://en.wikipedia.org/wiki/Support_(mathematics)#In_proba...


Yes it’s standard for interval arithmetic. Have a look at the interval operations section: https://en.m.wikipedia.org/wiki/Interval_arithmetic


Yes, I see. There's a desire to map intervals pointwise through functions, but also a desire to produce intervals by all-pairs calculations, and the impossibility of representing both interpretations in one notation leads to some inconsistencies.


no, it's just a common limitation of implementations of interval arithmetic. things like affine arithmetic solve it


There's some abuse of poor notation going on in the article. I don't think the author is intending to be confusing through this imprecision, but instead is just faithfully representing the common way people discuss this kind of stuff.

But it is confusing. And it is imprecise.

(I'll use x below to mean multiplication due to HN's weird formatting rules)

Nominally, if we have two intervals A and B we might anticipate there's a difference between AxA and AxB. In normal math we expect this because we use the different letters to indicate the potential for A and B to be different. Another way of saying it is to say that AxA = AxB exactly when A = B.

The trick of language with interval math is that people often want to write things like A = (l, h). This is meaningful, the lower and upper bounds of the interval are important descriptors of the interval itself. But let's say that it's also true that B = (l, h). If A = B, then it's definitely true that their lower and upper bounds will coincide, but is the converse true? Is it possible for two intervals to have coincident bounds but still be unequal? What does equality mean now?

In probability math, the same issue arises around the concept of a random variable (rv). Two rvs might, when examined individually, appear to be the same. They might have the same distribution, but we are more cautious than that. We reserve the right to also ask things like "are the rvs A and B independent?" or, more generally, "what is the joint distribution of (A, B)?".

These questions reinforce the idea that random variables are not equivalent to their (marginal) distributions. That information is a very useful measurement of a rv, but it is still a partial measurement that throws away some information. In particular, when multiple rvs are being considered, marginal distributions fail to capture how the rvs interrelate.

We can steal the formal techniques of probability theory and apply them to give a better definition of an interval. Like an rv, we'll define an interval to be a function from some underlying source of uncertainty, i.e. A(w) and B(w). Maybe more intuitively, we'll think of A and B as "partial measurements" of that underlying uncertainty. The "underlying uncertainty" can be a stand in for all the myriad ways that our measurements (or machining work, or particular details of IEEE rounding) go awry, like being just a fraction of a degree off perpendicular to the walls we're measuring to see if that couch will fit.

We'll define the lower and upper bounds of these intervals as the minimum and maximum values they take, l(A) = min_w A(w) and u(A) = max_w A(w).

Now, when multiplying functions on the same domain, the standard meaning of multiplication is pointwise multiplication:

    (A x B)(w) = A(w) x B(w)
and so the lower and upper bounds of AxB suddenly have a very complex relationship with the lower and upper bounds of A and B on their own.

    l(A x B) = min_w A(w) x B(w)
    u(A x B) = max_w A(w) x B(w)
So with all this additional formal mechanism, we can recover how pointwise multiplication makes sense. We can also distinguish AxA and AxB as being potentially very different intervals even when l(A) = l(B) and u(A) = u(B).

(As a final, very optional note, the thing that makes interval math different from probability theory is that the underlying space of uncertainty is not endowed with a probability measure, so we can only talk about things like min and max. It also seems like we can make the underlying event space much less abstract and just use a sufficiently high-dimensional hypercube.)


About the last remark, my intuition is that even though there are operational differences, any formalism to represent uncertainty should be roughly as useful as each other

I mean. Can you express Bayes rule using interval arithmetic? Or something similar to it


I think a more complete way to say it would be that probability theory is a refinement of interval theory. Per that last remark, I suspect that if you add any probability measure to intervals such that it has positive weight along the length of the interval then the upper and lower bounds will be preserved.

So in that sense, they're consistent, but interval theory intentionally conveys less information.

Bayes' Law arises from P(X, Y) = P(X | Y)P(Y). It seems to me in interval math, probability downgrades to just a binary measurement of whether or not the interval contains a particular point. So, we can translate it like (x, y) \in (X, Y) iff (y \in Y implies x \in X) and (y \in Y) which still seems meaningful.


Do you know any material that compares probability theory with interval arithmetic like this? I can't find it


I don't. I've never actually seen interval theory developed like I did above. It's just me porting parts of probability theory over to solve the same problems as they appear in talking about intervals.


> Is that some kind of standard notation?

Yes, it's just algebra.


Yeah it sounds like something he's made up. For matrices x^2 is just x*x, not element-wise power (which if you want to be deliberately confusing is also known as Hadamard power). The latter is apparently written like this: https://math.stackexchange.com/a/2749724/60289


> You measure the wall with a ruler and get 7 feet, then you measure the couch and get 7 feet. Can you fit the couch against that wall?

Normally, yes: the couches generally have somewhat squishy... stuffing? filling?.. which hang off the wooden frame, so you generally can squeeze a couch from the sides a bit to fit it into slightly narrower space.

> say 1/10th of a foot.

This is unholy. Either say "an inch" for this hypothetical scenario, or use some decimal-based measurement system.


It is hideous but it has precedent. American railway engineering is done with metric feet. That is, feet with decimals. Miles, yards and inches don't get a look in.


a foot divides well into 1/2,1/3,1/4,1/6, 1/12 too. and he picked 1/10.


I'd rather see an interval written as (6.9, 7.1) than (6.916(6), 7.083(3))


You just take inch as the basic unit and then the interval is (83, 85). By the way, SI has so many multiplying/dividing prefixes so that you could pick the units that would make most of the numbers in your field integers.


Couldn't the overdetermination problem be solved by tagging intervals with their ids? E.g., "x*x" should be resolved as "(x, i[-3, 3])*(x, i[-3, 3])" and "i[-3, 3]*i[-3, 3]" as "(nil, i[-3, 3])*(nil, i[-3, 3])". Since the tag matches in the first expression and not in the second (nil != nil) different rules could be used.

Btw this comes up a lot in compiler design for dynamic languages since constraining the value ranges of variables means that you can implement them more efficiently. In static languages it is not as important since the domains of most variables are either given to you or inferred by the type system.

Also an interesting head scratcher is how to implement interval arithmetic for modulus: i[3,6] % i[3, 6] = ???*


this is basically how affine arithmetic works. but in interval arithmetic or modal interval arithmetic the representation of a float is two floats, and in affine arithmetic the representation of a float is a potentially unboundedly large set of float coefficients, none of which can be safely ignored, so there are some drawbacks to this approach

i'm not sure why wayne doesn't mention affine arithmetic and modal interval arithmetic in this post; i'd think they were relevant


> none of which can be safely ignored

Or more accurately speaking, they can be ignored but at some expense of accuracy. (You can always approximate every variable with a (-inf, +inf) interval!) There are several working ways to "compress" affine variables when they get too many.

Probably a worse problem with AA is that every multiplication introduces a new variable, so it can be very frequent to head into the maximum number of variables set for the reasonable performance...


agreed. i find reduced affine arithmetic the most appealing, but haven't tried it


this requires all terms to be represented symbolically, which 'works', but now you need a full-blown symbolic rewrite engine to try to simplify things. very reasonable for an optimiser but not so much for numerics. for numerics a more common approach is interval mincing. we have a function f(x) = x*x, and try to apply f([-3 3]). we can 'mince' the inner interval and say that (e.g.) [-3 3] = [-3 -1] U [-1 1] U [1 3]. so we have f([-3 -1] U [-1 1] U [1 3]); distribute function application over U (which is sound) to get f([-3 -1]) U f([-1 1]) U f([1 3]), which is obviously tighter than f([-3 3]). we picked a subinterval size of 2, but can shrink it arbitrarily until we're satisfied with the result. this technique is essentially a form of case analysis, and can also be applied to optimisers/static analysers (complementarily to simplifying rewrites)


affine arithmetic does not need a full-blown symbolic rewrite engine, though it too sometimes produces overbroad intervals


so a middle ground (as many pseudo-symbolic approaches). glanced at wikipedia—this seems not dissimilar conceptually to the abstract domain of polyhedra, in that it's symbolic but has a flat structure and expresses only linear relationships. of course, polyhedra are exponential where affine arithmetic is quadratic (overall space in number of terms), and polyhedra are global where affine is local (therefore easier to implement as a library, but no sharing). more missed connections between numerical analysis and program analysis? (meh but no sharing probably means affine generally loses for a given resource budget. would still be interesting to try though)


this sounds interesting!

i think aa is linear in the number of terms, and reduced aa is linear in the number of independent variables, but possibly i don't understand your meaning


feel free to ping on irc if interested in chatting more on the topic, though i'm not sure i have a ton more original thoughts atm

if you have an expression with n terms, then you will end up with O(n) terms each taking up O(n) space, so the overall space usage is quadratic. (that's the fair way to compare to polyhedra since they're always global)


what's the scenario where you'd materialize the values of all those terms at once? maybe cache-invalidation-based incremental evaluation, like acar's self-adjusting computation? affine arithmetic (or raa) seems like it could be a good fit for that kind of thing, because it can handle small input changes within the affine approximation (with just a multiply-accumulate) and sometimes would benefit from being able to recursively subdivide the input space into smaller intervals without recomputing unaffected parts of the expression

on the other hand, if i am merely evaluating a polynomial with n terms, such as 3x⁴ - 2x³ + 17x² + x - 4 for n = 5, i only have three aa objects at any one time during evaluation of the expression, regardless of n. in this case the program might look like this

    a ← x; a ← a⁴; b ← 3; a ← a · b
    b ← x; b ← b³; c ← -2; b ← b · c; a ← a + b
    b ← x; b ← b²; c ← 17; b ← b · c; a ← a + b
    b ← x; a ← a + b
    b ← -4; a ← a + b
this assumes we have three registers for aa objects, creatively called a, b, and c; that we can only do arithmetic on these registers; and that the power operations are unary (otherwise they require an additional register)

this is true of any polynomial

with horner's-rule evaluation we can cut this down to two registers

    a ← 3
    b ← x; a ← a · b; b ← -2; a ← a + b
    b ← x; a ← a · b; b ← 17; a ← a + b
    b ← x; a ← a · b; b ← 1;  a ← a + b
    b ← x; a ← a · b; b ← -4; a ← a + b
obviously there do exist expressions with deeper minimum evaluation stacks, but i think they're only worst-case logarithmic in expression size, aren't they?


(dynamic) computation forms a dag, not a tree. i think a sum scan (n inputs -> n outputs) will trigger the worst case. it might be that computations tend to be tree-shaped, so you rarely hit the worst case, but that helps everybody out, not just affine arithmetic


that's a good point; i was thinking of single-output expressions, where i think always evaluating the deepest subtree first limits your space usage to worst-case logarithmic?


even single-output is a dag; you can explode the dag into a tree, but then you pay in time. suppose some expensive term x is used to compute n other terms. after computing x, assuming we want to share it, we have to compute all n terms before killing x; hence x sticks around longer than it might. (this doesn't necessarily lead to explosion, but i'm pretty sure you can use this to construct a scenario that needs a large number of live terms to avoid compromising time complexity; at least n live terms if the dag has nlogn nodes)


I took the idea of interval types and decomposed them to an even lower primitive: Inequality types. An interval type is just an intersection of two inequality types. For example `(>0) & (<1)` is the interval `(0, 1)`. You can read this as "a number being larger than 0 and smaller than 1". `(<1)` is also valid, which is "a number smaller than 1" as a type.

The nice thing about this decomposition is that applying arithmetic "just works" because you just define them for the inequality primitive.

I prototyped this for TypeScript and created a proposal. It does not contain type-level arithmetic because TS doesn't do that kind of type-level stuff in general. I'm not entirely convinced myself of that proposal, but maybe someone finds this interesting:

https://github.com/microsoft/TypeScript/issues/43505


Is this equivalent to open intervals e.g. (0, null)?


It would just be the type `(>0)`


Great article!

you may be interested in the relationship between programmatic Junctions and Interval Arithmetic ...

a question in our Discord channel asked "why are arithmetic operations on Ranges restricted" with some examples like this:

  (1..2) + 3 => (4..5)   #ok
  (1..2) + (3..4) => 4   #not (4..6)!
to cut a long story short, I made the Math::Interval module to extend the Range class to provide a working Interval Arithmetic model that uses a Junctions of Ranges to represent disjoint results

  # a divisor that spans 0 produces a disjoint multi-interval
  my $j1 = (2..4)/(-2..4);
  ddt $j1;            #any(-Inf..-1.0, 0.5..Inf).Junction
  say 3 ~~ $j1;       #True
(The long story is at https://rakujourney.wordpress.com/2023/11/11/junctions-for-i...)


A good mental tool for thinking about that is adding two dice rolls


If you are curious about this then you might also want to have a look at "propagation of uncertainty".

https://en.wikipedia.org/wiki/Propagation_of_uncertainty (warning: contains math symbols)


One detail of interval arithmetic not mentioned in TFA but of much consequence in the context in which we have to contend with in Ardour is ...

When you ask if a single non-interval lies within a given interval, the answer is yes or no (with a given resolution).

When you ask what the relationship between 2 intervals is, there are multiple answers (*). In a given problem domain, each one may have different semantic implications.

(*)

  interval 1 is larger than interval 2 and fully includes it
  interval 2 is larger than interval 1 and fully includes it
  interval 1 begins before interval 2
  interval 1 ends after interval 2
  interval 2 begins before interval 1
  interval 2 ends after interval 1
  interval 1 and interval 2 are disjoint
  interval 1 and interval 2 are equivalent at a given resolution


Allen's interval algebra describes the 2*6+1 = 13 relations:

https://en.wikipedia.org/wiki/Allen's_interval_algebra

It's possible to build a graph where the relations are nodes, and the edges are possible smooth operations on the intervals (e.g. translation). Then you have a state machine for smooth system evolution.


Right, I should have linked to that (or something like it) rather than trying to write out my own list.


This problem came up for me when writing a tool to help an author index his work. He wanted to be able to enter reference ranges for a term and then combine, including disjoint ones, into a single entry. (There was also a roman numeral problem irrelevant here).

This has also come up for me in two dimensions when dealing with overlapping rectangles. For some reason the complexity of it surprised me both times. Sadly computing these cases is a straight-forward slog in 1-D; you can however reuse the solution for higher dimensions in a nice way.


Most spatial databases use the R-Tree or one of its variants:

https://en.wikipedia.org/wiki/R-tree

e.g. PostGIS has GiST-RTree:

https://postgis.net/docs/manual-3.2/using_postgis_dbmanageme...


The 68040 CPU had two rounding modes to support interval arithmetic.

From the M68040 User’s Manual (https://www.nxp.com/docs/en/reference-manual/MC68040UM.pdf):

"The processor supports four rounding modes specified by the IEEE 754 standard. These modes are: round to nearest (RN), round toward zero (RZ), round toward plus infinity (RP), and round toward minus infinity (RM). The RP and RM modes are directed rounding modes that are useful in interval arithmetic."


these originated i think on the 8087 and are widely supported today, including all amd64 and arm designs


The real takeaway was the Frink language. https://frinklang.org.

The manual was a great read and the examples he uses are very entertaining.


See clpBNR with SWI Prolog for a way to use interval arithmetic in the broader scope of logic programming.

https://github.com/ridgeworks/clpBNR


Great job at explaining the complexities in this topic.


The x*x vs. x^2 issue that the author presents seems contrived. Interval arithmetic is just a specialized case of arithmetic on random variables. If two values are uncorrelated, why would you use the same variable for them? The ambiguity could be trivially resolved by representing the problem as x*x vs. x*y, where x = (3, 3) and y = (3, 3).


It took me a while to understand that example.

Basically if you multiply two intervals (-3,3) * (-3,3), it's relevant whether it's actually the same quantity multiplied by itself (in which case, you want the answer 0-9, because you'll always be multiplying something like -1 * -1, because you're sampling once from the interval and then squaring it), or it's two different quantities, both of which are in the same interval, in which case the answer you want is (-9,9), because you could be multiplying something like -3 * 3, because you're sampling twice from the same interval and multiplying the two results together.


That makes sense, at that point you need some concept of identity though, like a sequential ID, so you’d be working with something like (int, float, float) instead. Or allocate on the heap and use the address for identity.


Aren't intervals a rather limited substitute for distributions (of error)? Using the intervals as if they and their joint distributions are uniform doesn’t make much sense to me. Perhaps if you really need to know the limits, but then as was mentioned elsewhere, the interval widths grow very quickly.


Yes they are very limited. The sum of two uniform distributions is not uniform. If you care about the probability density within the interval you need to reach for more advanced methods.


Why isn't it called "intervallic arithmetic", sounds kinda clunky


Using intervals for measurements has some limitations. But for many use cases we do not need more than intervals, so it's nice to have convenient tools for them. Intervals are a convenient model.

That's because measurements are complicated.

You use a ruler (or some other instrument) to measure something and get a value x.

You are happy.

Then for some reason you decide to repeat the measurement and you get a slightly different value. And problems start.

You decide to write down all the values you get. You are happy again.

Shortly after, you realise you have to use those values in calculations and you just want "one representative value", so you take the average or "the most common value" or some other aggregation, use your intuition!

Things start to go wrong when you have to take a decision by setting some threshold like "I do this if my value is above a threshold". Because the actual value may be different from your averaged number.

So you take the standard deviation and call it the uncertainty x±s.

But one day you realise that your measurements are not symmetric. You start by saying "instead of x±s, I will use different upper and lower bounds to define an interval".

For instance some things are measured on a log scale and you have a measure like 100±"one order of magnitude" which is "100, but may be between 10 and 1000".

Then you add a confidence, because you are not 100% certain you actually are in that range. Your measurement becomes "with 95% confidence I can say the measure is in [10,1000], with an expected value of 100".

Then you want to combine and aggregate those intervals and you realise they within the intervals their regions are not uniform, you actually have a probability distribution.

In the simple case is a Gaussian distribution, described with mean and variance. It can also be a binomial (a "p out of n cases" scenario). Or a lognormal like un our 10-1000 example.

And now for each measure you take you need to understand what probability distribution it follows and estimate its parameters.

And that parameter estimation is a measure, so it has confidence intervals as well.

At this point adding two measurements becomes not so easy anymore... But don't panic!

The nice part about all of this is that usually you don't care about precise error estimates, because you can live with bounding errors covering a worst case scenario.

And you can use the Central Limit Theorem (sometimes it is abused rather than used) to simplify calculations.

It is a rabbit hole and you need to know how deep you want to dig. Intervals are usually convenient enough.


How can this be generalised to the complex plane?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: