How the Slowest Computer Programs Illuminate Math’s Fundamental Limits

tromp · on Dec 11, 2020

> The busy beaver game is all about the behavior of Turing machines

A busy beaver for lambda calculus is even easier to define:

the maximum normal form size of any closed lambda term of size n (or 0 if none exists).

The series [1] starts as 0, 0, 0, 4, 0, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 24, 26, 30, 42, 52, 44, 58, 223, 160, 267, 298, 1812, 327686, 38127987424941,

578960446186580977117854925043439539266349923328202820197287920039565648199686

Results are more fine grained than for TMs because these sizes are in units of bits (lambda and application measuring as 2 bits, and a variable bound by n'th enclosing lambda as n+1 bits).

Graham's number is surpassed within 114 bits, while the smallest known TM for surpassing it takes 225 bits to describe.

[1] https://oeis.org/A333479

bryan0 · on Dec 11, 2020

Cool. What are the first few lambda terms in the sequence?

tromp · on Dec 11, 2020

Here's the initial output of a Haskell program used to find them (courtesy of Bertram Felgenhauer):

    4:  \1
    6:  \\1
    7:  \\2
    8:  \\\1
    9:  \\\2
    10: \\\\1
    11: \\\\2
    12: \\\\\1
    13: \\\\\2
    14: \\\\\\1
    15: \\\\\\2
    16: \\\\\\\1
    17: \\\\\\\2
    18: \\\\\\\\1
    19: \\\\\\\\2
    20: \\\\\\\\\1
    21: \(\1 1) (1 (\2))

A few more are given in [1].

[1] https://mathoverflow.net/questions/353514/whats-the-smallest...

FartyMcFarter · on Dec 11, 2020

One fascinating thing about the function BB(n) is that it grows faster than any computable function.

The proof of this is simple: if you had a computable function f(n), and f(n) >= BB(n) for all values of n, you could use that to solve the halting problem, which in itself is impossible.

I used to implicitly assume that we can make concrete, closed-form, functions that grow as fast as we want them to. The above shows that this is clearly false.

Aardwolf · on Dec 11, 2020

But of course BB can get dwarfed by other uncomputable functions, like Rayo's:

https://googology.wikia.org/wiki/Rayo%27s_number

hnuser123456 · on Dec 11, 2020

That's a lot of words and symbols for some not-very-impressive lower bounds. I present hnuser123456's function:

f(n)=n (n Knuth up arrows) n

Coming back to Rayo, my number is instead f(10^100) rather than Rayo(10^100), and given the series of lower bounds presented in that article, this should easily dwarf that, unless Rayo's will end up growing Knuth arrows faster than linear. I'm not smart enough to evaluate that though. If it would, wonder if this would beat it:

f(n)=n (n Knuth up arrows) n (n Knuth up arrows) n

Or n copies of n Knuth up arrows...

FartyMcFarter · on Dec 11, 2020

Knuth up arrows are computable functions, right? So they can't possibly grow faster than BB(n), let alone even faster-growing functions.

hnuser123456 · on Dec 11, 2020

Must be. Since Rayo(10000)>2↑↑↑65536 it seemed easy to spam a lot more up arrows there (10k of them), but the tide probably changes after n grows by another 95 orders of magnitude. Wonder where Rayo's function intersects mine.

Aardwolf · on Dec 11, 2020

Rayo's function can describe the idea of "f(n)=n (n Knuth up arrows) n" with relatively few symbols - just like your post did using relatively few characters

Arcorann · on Dec 12, 2020

n (n Knuth up arrows) n (n Knuth up arrows) n = n (n+1 Knuth up arrows) 3, so that's not really impressive either. Your function is comparable in size to f_omega(n) on the fast-growing hierarchy. [1]

At any rate, it's known that BB(18) is larger than Graham's number [2]. In fact, the same person has a bunch of long-lasting Turing machines for values around 18 [3], giving the lower bounds:

BB(15) > f_omega(2046) > 2046 (2046 up arrows) 2046

BB(16) > f_omega(f_omega(5))

BB(17) > f_(omega+1)(10) BB(18) > f_(omega+1)(7e57) > Graham's number

BB(19) > f_(omega+1)^3(9) > f_(omega+2)(3)

BB(20) > f_(omega+1)^4(7e57) > f_(omega+2)(4)

BB(21) > f_(omega+2)(f_(omega+1)^2(4))

BB(22) > f_(omega+2)^3(f_(omega+1)^2(4))

And for larger values again [2]:

BB(38) > f_(omega2)(167)

BB(64) > f_(omega^2)(4098)

BB(85) > f_(epsilon_0)(1907)

Of course, we expect the Busy Beaver function to grow a lot faster than these bounds.

[1] https://en.wikipedia.org/wiki/Fast-growing_hierarchy [2] https://googology.wikia.org/wiki/Busy_beaver_function#Graham... [3] https://googology.wikia.org/wiki/User_blog:Wythagoras/The_ni...!

hnuser123456 · on Dec 12, 2020

Thank you. I've read all your links and will continue to improve my googology.

kzrdude · on Dec 11, 2020

How would you use it to solve the halting problem?

FartyMcFarter · on Dec 11, 2020

  # Assuming f(n) >= BB(n)
  def does_it_halt (m: turing_machine):
    n_steps = f(count_states(m))
    for i in range (n_steps):
       run(m)
    return is_in_halting_state(m)

The key point is that if you know that all machines with n states halt in BB(n) steps at most (or will otherwise never halt), you just need to run any of those machines for BB(n) steps and then check if they already halted or not.

Retric · on Dec 11, 2020

The tricky bit is proving BB(n) doesn’t solve the halting problem. Otherwise feed program of n size and run it for BB(n) + 1 steps and it’s stuck in a loop.

FartyMcFarter · on Dec 11, 2020

I'm not sure what you mean. Having a computable version of BB(n) for all n does solve the halting problem, which is why it's impossible to compute with any existing computer.

If you run a program with n states for BB(n) + 1 steps it will either already have halted, or it will never halt. This is by definition.

Retric · on Dec 11, 2020

Not quite, a BB(n) is just a number so a computer could have a precomputed look up table of them. Just as you can store 7 in a programs source code you could store BB(7).

The issue is the halting problem is for a given program and an arbitrary input. If you want to store all BB(n)’s for an arbitrary input in a lookup table you would need an infinite program size. Even if that’s acceptable, reading an infinitely large program + some input into itself takes infinite time.

lisper · on Dec 11, 2020

You are missing the point. Simply knowing BB(n) is enough to solve the halting problem for a TM with n states. It doesn't matter whether it is precomputed and stored in a lookup table, or computed on the fly, or revealed to you by God. If you know BB(n) by any means, you can solve the halting problem for a TM with n states. Therefore you cannot know BB(n) beyond a certain n.

Retric · on Dec 11, 2020

Very much no, any pattern of code is acceptable in computer science. You don’t need to have a specific class of computer compute the programs running on it. CS even has the idea of an Oracle machine: The problem does not have to be computable; the oracle is not assumed to be a Turing machine or computer program. https://en.wikipedia.org/wiki/Oracle_machine

It’s a very critical idea in CS. There is no specific BB(1, 2, ... n) that’s suddenly impossible to discover, however writing a program that runs on touring machines that can find arbitrary BB(n)’s isn’t solvable. Another way of saying it is it takes increasing complexity to solve BB(n) as n grows, a given program can only be of finite complexity.

kd0amg · on Dec 11, 2020

This entire thread is about how to use a BB oracle to solve halting (and why that means BB is uncomputable).

Retric · on Dec 11, 2020

Sure, but the poster clearly doesn’t understand if they are saying Therefore you cannot know BB(n) beyond a certain n.

I am not really conveying this clearly but people seem to misunderstand that Touring machines are just one class of computers. Just as quantum computers are another class of machines, so to are human minds or x86 processors. You can very much program a touring machine with the output of a different class of computer hardware, but that doesn’t solve the problem.

>>> You need an Oracle at runtime not compile time. <<< There that’s what I was missing.

lisper · on Dec 11, 2020

> poster clearly doesn’t understand if they are saying

I have a Ph.D. in CS, so you might want to consider the possibility that you are the one who is misunderstanding.

> Touring machines are just one class of computers

So first of all, it's Turing machines, not Touring machines. Named after Alan Turing. And yes, it's true that they are just one class of computer, but the reason Turing machines are a thing is that they are universal. A TM can compute any computable function, where "computable" means computable by any physically realizable model of computation that any human has been able to come up with, including quantum computers. (Quantum computers are just faster for certain problems.) So the details of the TM don't actually matter. There are lots of different computational models that are equivalent to TMs. It's just that the TM model was the first to be proven universal, so that's what everyone talks about. But all physically realizable models of computation that humans have been able to come up with to date are either equivalent to TMs, or strictly less powerful.

So: TMs can't solve the halting problem, and so no other physically realizable model of computation can solve the halting problem either because all known physically realizable models of computation are equivalent to TMs and therefore subject to the same fundamental limitations as TMs. If you knew BB(n) you could solve the halting problem. Therefore you cannot know BB(n). (It's a little but more complicated than that because we can't rule out the possibility that a more powerful model of computation exists and is physically realizable. But that would be the greatest revolution in the history of human intellectual endeavor, so the odds of discovering such a model are pretty low.)

Retric · on Dec 11, 2020

I don’t disagree with what you just said however odds of discovering such a model are pretty low is irrelevant when dealing with abstract computers. Further, suggesting real world physics is unlikely to work isn’t a proof even if I happen to agree.

In the end even if we had a physical machine to produce BB(n) that would not mean it was computable using a Touring machine. Their unrelated concepts.

PS: Looks like auto corrupt strikes again, I really stopped caring at this point.

boloust · on Dec 11, 2020

For an n-sized program, run it for BB(n) steps. If it hasn't halted then by definition it will never halt.

sgfyd · on Dec 11, 2020

> Nevertheless, that incomprehensibly huge number is still an exact figure whose magnitude, according to Aaronson, represents “a statement about our current knowledge” of number theory.

This is, of course, contradicted later in the article in which they point out that `BB(800)` (or so) has been proved independent of ZFC; there's no really good reason to believe `BB(27)` is not likewise independent. There may still be a truth of the matter about the value of `BB(800)` or `BB(27)` (as the case may be), since ZFC hardly encompasses all truth, but calling such a number an "exact figure" is a bit like claiming to know the birth date of Julius Caesar down to the millisecond: Implausible to start with, and the closer you look the more you wonder, "what does the question even mean? Relative to what standard?" (In the one case, "of time"; in the other, "of mathematical truth".)

tromp · on Dec 11, 2020

> There may still be a truth of the matter about the value of `BB(800)`

The reason for independence of ZFC is that ZFC fails to prove "TM M doesn't halt", for some non-halting 800-state Turing Machine M. It has no problem proving the halting of the busy beaver machine in BB(800) steps.

Unlike the birth-time of Caesar, BB(800) is still well-defined and an "exact figure", because we consider the notion of whether any particular Turing Machine halts or not to be well-defined.

ifdefdebug · on Dec 11, 2020

So does this mean that "the machine that halts if ZF is inconsistent" is guaranteed to not halt (and not loop) in less than BB(748) steps?

Or would it be possible for some crazy billionaire to throw immensurable amounts of computing power at the problem and run the machine just to, after a few years, wake up one morning, find that his machine has halted, and tell the world he had proved ZFC to be inconsistent?

Likewise, could he wake up one morning, find his machine caught in an infinite loop, and tell the world "hey, I just proved this machine will never halt, so I proved ZF to be consistent, quod est adsurdum so better forget about ZFC...

tromp · on Dec 11, 2020

A machine looking for inconsistencies cannot enter an infinite loop. The belief that ZFC is consistent is as good as a guarantee that this machine will never halt. It will endlessly search for a non-existing proof of inconsistency.

Btw, even an attempted computation of BB(10) will outlast the heat death of the universe.

ifdefdebug · on Dec 11, 2020

Thanks for your answer. But I think this does not address my question (I am genuinely curious from a layman's point of view). As far as I understood from the article:

A well known TM exists with 748 rules, that halts if ZF is inconsistent. Anybody can build and run that machine. Is that correct?

If that is correct, I can run that machine. Imagine I have loads of computing power to run it. Do I have a chance, in principle, even if it is an infinitesimal chance, to either step into a loop or see my machine halting, before BB(748) steps have been done, maybe even within my own lifetime (in principle)? Or is it guaranteed that this machine can not loop or halt before BB(748) steps?

I think it is an interesting question, because in the first case, a proof of ZFC inconsistency would be possible in principle if I get lucky and a halt or loop exists early on (though a proof of consistency would not), in the latter case it would not be possible even in principle because of the limitations of our universe.

tromp · on Dec 11, 2020

Yes, you can run the machine that looks for ZFC inconsistencies. And if it halts, it is necessarily within BB(748) steps, as that forms an upper bound on the halting time of any 748 rule TM. Just know that BB(748) is incomprehensibly larger than BB(10) and the latter can't even be computed within the lifetime of the universe.

But yes, in principle you could stumble upon a ZFC inconsistency that can be proved very easily and takes much fewer steps than even BB(10).

ifdefdebug · on Dec 11, 2020

Thanks again, I can see now how my question was nonsensical...

rocqua · on Dec 11, 2020

The fact that a number is independent from ZFC does not meant that the number is inexact. The theoretical functioning of a 800 state Turing machine does not obviously depend on your axiom system.

In fact, I find the example a pretty nice argument for 'undecidable' problems having an actual truth value. Specifically because of the thought experiment of 'how would BB(800) not have an actual exact value'.

pkrumins · on Dec 11, 2020

I created the visualization for this story by running BB(5). You can find the source code (a full Turing Machine implementation that runs BB) and visualizations of other BBs in my blog post https://catonmat.net/busy-beaver.

dandanua · on Dec 11, 2020

> If you knew all the busy beaver numbers, then you could settle all of those questions.

A lot of important questions in number theory could be answered, but not all. For example, the mentioned Collatz conjecture (3n+1) is of different type. To prove this conjecture you need to prove the halting for EVERY argument. Checking halting of a single program will not be enough, so busy beaver won't help. This is a next level task in the Arithmetical Hierarchy [1]. Though, in fact, you can define the next level Busy Beaver that could solve Collatz conjecture. To solve more complicated problems you need high-order Busy Beavers.

[1] https://en.wikipedia.org/wiki/Arithmetical_hierarchy

fjfaase · on Dec 11, 2020

Very interesting. In the past, I spend some time on finding the largest number that could be calculated by a Brainfuck program when the memory cells can hold arbitrary large numbers (and not be restricted to bytes as with the official language). It takes a bit longer to take-off. For my findings, see: https://www.iwriteiam.nl/Ha_bf_numb.html

cft · on Dec 11, 2020

I wonder how one goes about proving that such large computations as BB(27) are physically impossible in our universe?

cjfd · on Dec 11, 2020

If a computation goes through N states before completing the universe in which it takes place has to have at least N possible states. One theory (https://en.wikipedia.org/wiki/Bekenstein_bound) would state that the possible number of bits of information in a universe with radius R is the surface of the sphere 4 * pi * R^2 expressed in Planck units. This gives a limit to how many states this universe can have.

cft · on Dec 11, 2020

But the number of states N required for BB(27) could be (much?) smaller than the number of steps of BB(27)?

tialaramex · on Dec 11, 2020

Nope. Suppose that say, state #16 and state #416 are in fact identical. Well then this machine does not halt, as of course being in this state leads back to the same state again in just four hundred moves, state #17 and #417 and so on will be the same too.

The notional paper tape might have identical states, but if this happens in a machine that does eventually halt the rest of the machine state must be different, and these are both state that a universe must keep in order to instantiate the machine.

JoshTriplett · on Dec 11, 2020

Different senses of the word "state". Such a machine could have fewer state-machine states than steps, but the overall state of the machine (including all its auxiliary storage) must never repeat, or it's in a loop and will never terminate. And BB machines must terminate, by definition.

cft · on Dec 11, 2020

Yes, I should have thought more about it. So these BB(N) numbers are direct complexity bounds on various mathematical statements.

chalst · on Dec 11, 2020

Smaller, yes: if a TM uses n bits of space, its execution time can exceed any polynomial of n in steps.

But any such space-bounded deterministic computation that takes 2^n steps will not terminate, by exhaustion of possible state transitions, and nontermination would mean it does not compute a function over all inputs. This constraint is small in terms of the kind of growth functions this kind of maths deals with.

chalst · on Dec 11, 2020

Andrej Bauer: Is it just me, or is the busy-beaver function a figment of classical mathematics?

https://twitter.com/andrejbauer/status/1327670607824252928

qsort · on Dec 11, 2020

What does he mean by that? (not criticism, I genuinely don't understand).

chriswarbo · on Dec 11, 2020

There is a distinction between "constructive" mathematics (AKA "intuitionistic") and "non-constructive" mathematics (AKA "classical").

In constructive mathematics, the only way to prove that something (e.g. a number) exists is to give an algorithm that constructs it.

We can do this in classical mathematics too (roughly speaking, it's a super-set of constructive mathematics); but we can also prove that something exists in another way: by disproving that it doesn't exist (i.e. a double negative). In other words, classical mathematics lets us use a proof of 'not(not(X))' as a proof of 'X'. There are some other ways to express this, which are essentially equivalent; e.g. the "law of the excluded middle" 'for all X, either X is true or not(X) is true'. These proof steps are not constructive, since knowing that not(X) is false doesn't tell us how to actually find/construct X. Likewise, the law of the excluded middle is like writing an if/then/else statement where both branches will arrive at the same result, but it doesn't tell us which of those branches to take.

We usually talk about proofs being constructive or non-constructive, rather than values (e.g. numbers). A constructivist shouldn't agree with a non-constructive proof, but they should agree with an alternative, constructive proof of the same result.

The interesting philosophical question being made here is: if a value's existence can only be proved non-constructively, should constructivists disbelieve in that value? In this case that value is the busy-beaver function: it can't be defined constructively, since no algorithm can decide the halting problem.

chalst · on Dec 11, 2020

This is a good explanation. I just have one quibble: the school of constructivism that Andrej follows is not the only one; there is a Russian school that accepts Markov's principle, which is essentially the logical counterpart of the idea that there is a fact of the matter whether a deterministic Turing computation terminates. I don't know if constructivists from the Russian school allow BB to be definable, but the argument for it to be undefined is trickier on this account.

https://en.wikipedia.org/wiki/Markov%27s_principle

chalst · on Dec 11, 2020

On his interpretation of constructivism, there are no mathematical constructions that are in principle non-computable. The proof that a function satisfying the property of "busy-beaverness" couldn't be computed thus is proof that the property does not determine a total function, so the classical definition is rejected as a non-constructive gesture at infinity.

zodiac · on Dec 11, 2020

My guess: the definition of the BB function relies on the fact that a TM either halts or does not halt

lisper · on Dec 11, 2020

> 6,561 possible machines with two rules

Huh? How do they get to that number? A TM state can be described with 4+2log2(N) bits, where N is the number of states, so a 2-state TM can be specified with 12 bits, so there are at most 4096 of them. Where do the extra 2465 come from?

xjparker · on Dec 11, 2020

Hi, I'm the journalist who wrote the Quanta article. Scott Aaronson's Busy Beaver survey paper provides a function for calculating the total number of possible (2-symbol) Turing machines with n states: T(n)=(4n + 1)^2n. When n=2, T(n)=6,561. According to an email I got from Aaronson, this "allow[s] the possibility of halt states."

lisper · on Dec 11, 2020

Hm, interesting. I think the discrepancy comes about because Scott takes into account equivalence under state permutation, but I'd have to crunch some more algebra to be sure. In any case, I'd certainly accept Scott A. as an authoritative source on this.

[UPDATE] No, that can't be it. A two-state machine has no equivalent permutations because one state has to be a privileged start state. And even if it did, there are only two permutations of two states, so this can't possible explain the discrepancy.

[UPDATE2] Turns out there is a typo in Scott's formula. The correct answer is (4(n+1))^2n, not (4n+1)^2n. This is because each state transition has 4(n+1) possible values, 4 possibilities for the movement direction and symbol to write (with two possibilities each) and, independently, n+1 possible next states (including the halt state). The result for 2-state machines is, as /u/tromp correctly noted, 20736.

[UPDATE3] (I think I'm about to set some kind of record here.) As a sanity check, it's pretty easy to see that the result has to be an even number because every possible TM has a corresponding distinct TM where the direction of movement or written symbol in any state is changed to the opposite value.

tzs · on Dec 11, 2020

OK, I looked at Aaronson's paper. Here is his description of a Turing machine:

> Following Radó’s conventions, we consider Turing machines over the alphabet {0, 1}, with a 2-way infinite tape, states labeled by 1, ... , n (state 1 is the initial state), and transition arrows labeled either by elements of {0, 1} × {L, R} (in which case the arrows point to other states), or else by “Halt” (in which case the arrows point nowhere).

Note that under this definition when the machine halts it does NOT write the tape or move the head.

That gives the machine 4n+1 ways to respond to a given input. If it is not halting, there are 4 actions it can take (0L, 0R, 1L, 1R) and n states it can go to. If it is going to halt, there is just one action. Hence, 4n + 1 possible responses.

With two possible inputs, that's (4n+1)^2 possible rules, and with n rules that's (4n+1)^(2n) possible machines as given by Aaronson and used by the article's author, which is indeed 6516 when n = 2.

For machines that do write and move even when halting, we can treat halt as an extra state, which gives the formula in your UPDATE2.

For completeness we should consider machines that write but do not move on halt. Their formula is (4n+2)^(2n), or 10000 for n = 2.

(That would also be the formula for rather pointless machines that move but do not write on halt).

lisper · on Dec 11, 2020

> Note that under this definition when the machine halts it does NOT write the tape or move the head.

That's a good point. This is a more parsimonious definition because if it's going to halt we don't really care what it does immediately before.

I guess the real lesson here is that it's pointless to quibble about this because the number turns on arbitrary features of your definitions. (Also, if you think Scott Aaronson has made a mistake, even a trivial one, you are almost certainly wrong :-)

tromp · on Dec 11, 2020

> if it's going to halt we don't really care what it does immediately before.

The final move does not matter, but the symbol written matters to the extent that BB(n) is defined as the number of 1s on the tape upon halting.

So, while redundant, the 4(n+1) count is chosen for parsimony. If one is willing to specialize the transition to halt to have no move and a mandatory write of 1, then a count of 4n+1 still agrees upon the same value of BB(n).

lisper · on Dec 11, 2020

Is there really any point in quibbling over a difference of +/- 1 when contemplating (because that's really all we can do) the value of BB(n)?

tromp · on Dec 11, 2020

They can be described with 2n * log(4(n + 1)) bits, since each of the n states has two rules, and each rule specifies a new binary symbol, a left or right direction, and a new state which can also be the halt state. For n=2, this gives 12^4 = 20736 machines. I have no idea where the number 6561 comes from.

cedilla · on Dec 11, 2020

A052200[1] lists the number of possible Turing machines and it agrees with you.

I found no sequence containing 6561 and either "Turing" or "beaver" that illuminated me to where that number came from.

1: https://oeis.org/A052200

lisper · on Dec 11, 2020

Right. I forgot about the halt state. But yeah, 6561 (=3^8) still looks mighty hinky.

tzs · on Dec 11, 2020

Probably its a 3 symbol Turing machine (0, 1, blank).

For each rule, there are 3 possible inputs (0, 1, blank), 3 possible outputs (0, 1, blank), 3 possible movements (left, right, stay), and 3 possible next rules (rule 1, rule 2, halt). That gives 3^4 possible rules, or 3^4 3^4 = 3^8 = 6561 possible pairs of rules.

lisper · on Dec 11, 2020

Nope, that's not it. See the sibling comment from xjparker.

monster_group · on Dec 11, 2020

After reading this kind of fascinating stuff, I feel ashamed that I am wasting the opportunity of having a human mind by writing CRUD applications (though I am nowhere near as smart as this stuff requires one to be - I can barely clear whiteboard coding interviews).

earthboundkid · on Dec 11, 2020

I too have read Scott Aaronson's blog.