Here's my attempt to translate. I think it's probably wrong and welcome corrections.
You have a function `p` which accepts an "element of the Cantor space" (infinite sequence of bits) and returns true or false. You want to find some value where it returns true, or prove that it returns false for every value.
Crucially `p` is not a math function, it is a Haskell function. So you can't say "p is true iff its argument is pi" because you can't write that in Haskell. In fact, all `p` can do is look at finitely many bits - if it looks at infinite bits, there is an input for which it will never return. And because it looks at finitely many bits, there is necessarily a largest bit-index which it looks at.
Now a generic-imperative programmer might attempt:
for (bignum x = 0; x < infinity; x++)
if (p(x)) return x;
return NotFound;
but the problem is knowing when to stop, so as to return NotFound.
However, if we could detect when `p` demands more bits than we have produced, then the search will terminate:
int bits = 0;
bignum x = 0;
while (x < (1 << bits)) {
try {
if (p(x)) return x;
x++;
} catch (BitAccessOutOfBounds) {
bits++;
}
}
return NotFound;
Haskell's laziness enables it to "drive the loop" but the idea is not so different.
Really nice explanation! I'd also add that it's not enough that p is continuous (only looks at a finite number of bits for each input) -- it must also be uniformly continuous (the number of bits it looks at can't become arbitrarily high for arbitrary inputs). For example, if you exclude the sequence of all ones 111111... from the Cantor set, you can write this function p which always returns false:
function p(bits) {
n = 0
while bits(n) == 1 {
n++
}
return false
}
This function would loop forever on the all-ones sequence. But since that sequence was excluded, p is a total function which looks at more and more bits depending on how many leading ones there are. While each particular input only demands a finite number of bits, there's no finite number of bits which lets you search the entire space. The compactness of the Cantor set ensures that all continuous functions are also uniformly continuous, but if you remove points from the set, it can stop being compact, and the magic spell can be broken!
Yes, you have it exactly right, including the usage of non-local control flow to interrupt and restart the search. [0] is another implementation in an imperative language.
Laziness isn't the essential thing here -- the article's construction specifically doesn't rely on laziness, and would work in a strict language (almost verbatim -- you might need to write "lambda i: f(i)" instead of "f" in a few places).
I'd say that the thing that makes the article's version work is that you can pass a closure to the predicate, which is a channel for communicating extra information that makes it not quite a black box. In an imperative language, you could imagine passing in a function that mutates some state to get information out (or throws an exception, the way you set it up). The article shows that even that isn't necessary: You can pass in a sequence (represented as a function from index to value) that closes over the predicate itself, and recursively uses it to build a bit sequence that satisfies it, if one exists.
By the way, there's a clearer (I think) version of the core idea here that uses conatural numbers (the naturals + infinity) that people don't usually present, but might be worth writing up.
In the article `p` really does take a list not a function! Haskell lists may be "backed by" functions but these functions must be pure - no back-channels. I'm not sure if this fact illuminates or obscures the point.
Anyways you can't pass a closure to the predicate, and the Haskell version has no "back channels" other than termination. The idea is to arrange the recursive search so that the predicate (not the search) demands the bits. This is what terminates the recursion, and it's hard to arrange without laziness (except by introducing more functions as you say).
> type Cantor = Natural -> Bit
> (#) :: Bit -> Cantor -> Cantor
> x # a = \i -> if i == 0 then x else a(i-1)
(You could represent bit streams with something like data Stream = Cons Bit Stream in Haskell, of course, but that's not what the article does. The code in the article works in e.g. Python almost as-is.)
And you can certainly pass closures to the predicate. Not closing over a mutable variable, of course -- I just mean that the bit stream function itself has to have access to p, which is the trick that makes this possible. If you couldn't do that -- if you could only query it with bit sequences that were defined independently of p -- this wouldn't be possible.
A lot of this theory is gamified "you pick YOUR function, THEN I'll pick MINE..." and the order is very important. Here representing sequences as functions only obscures. The important idea is that the sequence is lazy, BUT predetermined. It can't know what the predicate will be. That's where the black magic comes in.
> You could represent bit streams with something like data Stream = Cons Bit Stream in Haskell
I wish it did, it would be much clearer. The article makes that point itself:
"For the type of infinite sequences, we could use the built-in type of lazy lists..."
which is the point you made (which the article dismisses as insufficiently mathematical).
> If you couldn't do that -- if you could only query it with bit sequences that were defined independently of p -- this wouldn't be possible.
No, there's the rub! The bit sequence and `p` are independent. You choose your bit sequence, I choose `p`, and somehow execution finishes. That's exactly what makes this "seemingly impossible."
There is a back-door dependence where `p` forces a lazy sequence to materialize bits, but nothing more. The sequence cannot introspect `p`, and vice-versa.
I think this my third time encountering this post and it took me a few tries to get it!
On laziness, I only mean that the language feature isn't important. You can certainly represent infinite data using laziness (and you must represent it as some form of codata, such as a function), but it's not really the crucial thing here.
Here's a simpler version of seemingly impossible programs, which is better for explaining what I mean: Instead of bit streams, use conatural numbers, i.e. the natural numbers + infinity. You can define these in various ways (e.g. monotonic bit streams), but we'll use a regular Haskell type:
> data Conat = Z | S Conat
Ignoring bottoms (which we won't permit), this has all the natural numbers as inhabitants, but it also has infinity:
> inf :: Conat
> inf = S inf
Note that, if I give you a conatural, all you can do with it is peel off successors; you can't distinguish infinity from "a really big number", extensionally.
A predicate on conaturals is a total function that takes a conatural and returns a boolean. As in the bit stream case, totality is a very strong requirement. For example, "is n even" isn't a valid predicate, because the way to check for evenness is to peel off pairs of successors until you reach either 0 or 1, but that doesn't halt on infinity. In particular, to be total, every predicate must give up eventually: After seeing k successors (for some unknown k), it has to give up and return true or false. If there was no such k, the predicate wouldn't terminate on infinity.
Now you can ask: Given a predicate, how can I decide whether there's any input that it holds for? And this is "seemingly impossible": You can test the predicate on infinity, and, say, on every finite number up to some N, but that doesn't give you any guarantee about the numbers you haven't tested. If all you can do is query p with any finite set of Conats -- even interactively, deciding what to ask it based on its previous responses! -- you don't get enough information to decide if the predicate is satisfiable.
The trick is to construct a special conatural which is defined in terms of p itself. If you think of this more imperatively, you can imagine a program that queries the predicate about increasingly larger natural numbers -- 0, 1, 2, 3, ... -- and prints out an "S" each time the predicate returns false, and a "Z" when the predicate returns true. If the predicate is false for all inputs, it prints out "S"s forever -- but that's still a valid (productive) conatural number.
This conatural is -- by construction -- the smallest input that satisfies p, if there is one, and infinity if there isn't. Now you can apply p to this number, and, if there's any input that satisfies p, p will return true.
The point I was making above is that it's crucial for this conatural to be able to call p. It doesn't introspect p, in the sense of being able to access its source code, but it's not independent either. And, although laziness expresses this elegantly, I wouldn't really say it's the core of the trick here. You could express the same thing using Turing machines that write out "S"s to their output tape, for example (as long as you don't require them to terminate overall, only to be productive). You do need to guarantee that the "conatural" Turing machine that you give to p has access to p (either by embedding its source code or by being able to call it as an oracle).
Here's the implementation in Haskell (which is simpler than the bit stream case given in the article, and doesn't need mutual recursion):
> -- find p returns a conatural that satisfies p, if one exists.
> -- (If one doesn't exist, it returns infinity.)
> find :: (Conat -> Bool) -> Conat
> find p = if p Z then Z else S (find (p . S))
> -- Does any conatural satisfy the predicate?
> exists :: (Conat -> Bool) -> Bool
> exists p = p (find p)
I took Martin Escardo’s course on functional programming at university. Nothing but positive things to say about the man. A fantastic teacher with a real care for his students.
I think I have a half-baked understanding of what's going on here. Let me talk through it.
Take a function f : unit_interval -> bool. I believe that if there is no bound on the execution time of f, then f must not be a total function. (So conversely, if f is total, then it has bounded execution time, and is therefore only considering prefixes of the digit streams of up to some bounded length. From there, searching for an example input x s.t. f(x) = true is easy.)
Naively, it would seem plausible that there exists a total function which, nevertheless, takes longer and longer to run on a particular sequence of inputs. So let such a sequence be given, where the runtime of f(x_i) tends to infinity, even though it is finite for each x_i.
But from topology, we know that a sequence of points in a compact space, like the unit interval, has a convergent subsequence. The interesting thing happens when you look at f(y) where y is the limit point of such a subsequence. It is the case that as i increases, x_i and y increasingly look the same for longer and longer prefixes of their decimal (binary) expansions. Therefore, f has to do more and more work to tell apart x_i and y, which gives us an argument that there is no bound on the runtime of f(y); i.e., f doesn't terminate for that particular input. (Except, you have to get nitpicky about redundant representations of reals like 0.0099999... vs 0.010000..., but we can always choose the representation of y that suits us: the one that looks more like the x_i's.)
Yeah I think there are some small problems with your argument, unless it was only meant to illustrate the idea.
For the sake of argument let's say we simply input both decimal representations of each number. A program that simply tries to find the first decimal that isn't 9 for a number in [0,1] would take an arbitrary amount of time, but is still total because 1 can be identified in a single step.
The problem here is that while the function is total it isn't continuous and so the compactness of the topology is irrelevant. As far as I can tell the article uses some trickery to ensure that computable functions _are_ continuous, in which case it does work.
You're right, it's not a precise line of reasoning. I was considering only computable functions (else "runtime" isn't a meaningful concept) and treating the input domain as actually being the Cantor space, where 0.9999... and 1.0000... are actually distinct values. I just found it helpful to think about the Cantor space in terms of the unit interval.
> the Cantor space, where 0.9999... and 1.0000... are actually distinct value
What do you mean here? The Cantor set is a subspace, not a cover, of [0, 1]; values in the Cantor set that are the same in [0, 1] are also the same in the Cantor set.
Sure, the Cantor set is the set of real numbers in [0,1] with trinary expansions that do not contain 1 at any point.
A Cantor space is any set that can be mapped to (and from) the Cantor set.
The Cantor space is the set of infinitely long bitstrings; instead of a trinary number 0.200220222200..., you'd conceive of the same entity as the string "100110111100...". The mapping is pretty obvious.
0.99999... and 1.0000... are neither of them bitstrings, so it's not really clear what cvoss was thinking of. But it's certainly true that "0999999..." and "1000000..." are distinct strings.
Going back to the Cantor set, you'd usually represent 1 as 0.22222222... for obvious reasons of consistency (that is the only format matching the representation of the rest of the set). It would be weird to represent it as 1.000000000...; that would look like you were considering the left end of the Cantor set in [1,2] rather than the right end of the Cantor set in [0,1].
The example you suggest has a subtle issue, since the number 0.999… is the number 1; so a program fed numbers (rather than, say, decimal strings, in which case the domain is no longer [0, 1]) cannot meaningfully distinguish the former from the latter, and hence cannot terminate on 1.
Right, but then the domain isn't [0, 1] any more (it's got lots of double points, like 0.999… and 1, and 0.8999… and 0.9 etc.), and your function doesn't terminate on input 0.999….
I guess you meant to feed only the terminating decimal if there was one? Then, I agree, this is a good illustration of how a discontinuous function can violate all the nice properties on which the mathematics of functional programming relies.
One way to imagine this space is as an indexed bank of light switches. Each switch starts out unset, and our goal is to find a configuration on a finite number of switches which will satisfy our search. The search can only check one switch at a time and must commit after each check. This allows a basic nemesis to compute a configuration by running the search repeatedly and rescuing it after each time it fails by checking the last switch checked.
A gorgeous immediate consequence is that each search is looking for a solution to a SAT problem! Each SAT solution describes the edge of a region of Cantor space whose elements all have the searched-for property. And thus, while the search can proceed in finite time, it must sadly be NP-complete.
You have a function `p` which accepts an "element of the Cantor space" (infinite sequence of bits) and returns true or false. You want to find some value where it returns true, or prove that it returns false for every value.
Crucially `p` is not a math function, it is a Haskell function. So you can't say "p is true iff its argument is pi" because you can't write that in Haskell. In fact, all `p` can do is look at finitely many bits - if it looks at infinite bits, there is an input for which it will never return. And because it looks at finitely many bits, there is necessarily a largest bit-index which it looks at.
Now a generic-imperative programmer might attempt:
but the problem is knowing when to stop, so as to return NotFound.However, if we could detect when `p` demands more bits than we have produced, then the search will terminate:
Haskell's laziness enables it to "drive the loop" but the idea is not so different.