Noob question time: what's the halting problem?

ejames · on Aug 2, 2011

The halting problem:

Given a program and an input, determine whether that program, given that input, will ever halt, or if it will loop infinitely.

One of the fundamentals of academic computer science is learning that it is mathematically impossible to solve the halting problem - which is why the grandparent comment simply notes "the halting problem" with no further explanation; it's a famous problem. (Don't feel bad, everyone has to learn something the first time!)

The poem linked elsewhere in this comment tree is a simple and pithy description of the mathematical proof.

For (many) more details, Wikipedia has plenty of information.

achompas · on Aug 3, 2011

Thanks for the explanation! No shame here (I didn't study CS in undergrad) but I find it intuitive that a program can't detect an input that would loop indefinitely.

carbonica · on Aug 3, 2011

If that's intuitive for you already (great!) consider one step further: Rice's Theorem. Rice's Theorem in layman's term is "all nontrivial properties of a program are undecidable in the general case". "Undecidable" is the difficulty class of the halting problem (there are actually harder problems!).

A couple accessible examples:

1. Constant Propagation - we know that we'll never detect all compile-time constants because of Rice.

2. Exception frequency - I can write a function that has "raise new FooException()" in the code, but the function never raises, and you won't be able to prove it never raises.

wisty · on Aug 3, 2011

Well, you won't always be able to prove that it never raises. In many cases, you can, which is why compilers can eliminate dead code. Sometimes the compiler can prove the code is dead, sometimes it can prove the code is live, and sometimes it just doesn't know. For any prover, there will be code that defeats it.

onan_barbarian · on Aug 3, 2011

The halting problem is an profound bit of theoretical CS well-explained on this thread.

However, the actual halting problem is moot in most of the references I see to it, as the halting problem is usually misused by the half-educated to make claims that 'nothing useful can ever be inferred about the behavior of any program, ever'.

For example, we built a tool to either estimate a reasonable ceiling on the stack usage (whole-program) or give up; it works very well and can detect when the structure of the program prevents this tool from operating correctly (nasty stack mischief or stack adjustments appearing in 'unusual' places). Quite useful for us, especially because we control the source code and know that we don't do naughty things to our own stack.

Entertainingly, on the way to build this tool, we encountered at a couple screeds about how this problem just plain can't be solved because it's equivalent to 'solving the halting problem'. Well, yes - in theory, but no - in practice.

carbonica · on Aug 3, 2011

My undergraduate thesis tackled static analysis in Ruby. One of the problems I attacked was "does a given method yield to its block?" I broke methods into a few classes, two of which are "Block-Required" and "Block-Optional."

I had to write software which proved a method optionally yielded, which isn't just undecidable, but unrecognizable. For non-CS folks, that means it's actually harder than the halting problem. Yet my software works on nearly all Ruby code I throw at it - it only really has difficulty on complex delegation patterns. Unsurprisingly, the code people write to create Block-Optional methods is highly structured.

jules · on Aug 3, 2011

That problem gets hard very quickly when you deal with recursion. How did you deal with that? (if you did)

onan_barbarian · on Aug 3, 2011

We control the source code in question, so we deal with it by detecting unexpected recursion and screaming bloody murder.

We have a 'little bit' of recursion where we know that we won't go around the recursion more than once and we've hand-hacked that it. We have similar hacks to deal with indirect calls.

I'm not saying we've solved the problem for arbitrary code; I'm saying that solutions that fall very far short of solving problems for arbitrary codes are still enormously useful. Many compiler optimizations don't work for 'arbitrary code' either, for example, and know just enough to bail out when they see an irreducible flowgraph or a bizarre indirect jump, etc.

romaniv · on Aug 2, 2011

It's mathematically impossible to write a program that always correctly determines whether an arbitrary piece of code will halt (rather than get stuck in an infinite loop). Making that determination is referred to as halting problem.

DasIch · on Aug 2, 2011

Basically the halting problem states that it is impossible to determine whether or not a given program halts under the assumption that it is executed in a turing complete machine.

Due to the fact that a program is a composition of programs which at the lowest level are loops and instructions this also means that it is impossible to determine whether or not a loop halts.

xilun0 · on Aug 2, 2011

It's only impossible to write a program that can be feed with any completely arbitrary program and input and yet determine all the time if it will halt or not.

What few people realize, is that most completely arbitrary programs are very uninteresting. More interesting programs have characteristics that would very often allow to statically prove that all of their loops that should terminate indeed will terminate. And even interesting loops for which it would not be in the first time possible to determine that often gain in clarity from being slightly changed so that the tool can do its work.

IIRC MS has a tool that does sort of that for windows drivers.

xyzzyz · on Aug 2, 2011

Due to the fact that a program is a composition of programs which at the lowest level are loops and instructions this also means that it is impossible to determine whether or not a loop halts.

It is easily provable that this is theoretically possible, but the required time is too long for it to be applicable. Suppose your memory has N states. Run your program through N+1 computation steps and if it's not already finished, then it will never halt.

Thing is, N is very large for real memory, so this straightforward approach is not applicable. However, there's much room to improve this algorithm and I assume that's what the MIT folks did.

True halting problem is about computational devices with infinite memory.

daeken · on Aug 2, 2011

It should be made more clear that 'memory' here isn't the only thing in play. Register states, as well as the state of any IO subsystems are also important. If you're not looking at IO, then polling for events to happen based on network traffic will be shown to never terminate, if that event doesn't happen during the testing.

sbov · on Aug 2, 2011

It also may not be applicable if you don't actually have access to all memory states involved. E.g. when communicating with a 3rd party computer over a network.

xyzzyz · on Aug 3, 2011

It still is applicable, as long as you have a bound on the number of unknown states, however the number of computation steps in this case is so high that it is even less practical than the totally unpractical algorithm I proposed in my first comment.

Let's consider both computers (yours and 3rd party) as a single computational device, with a state represented by a pair of symbols, where the first represents your computer and the second the 3rd party's. Suppose your computer has N states and the other M. It's enough to run N * M + 1 computation steps -- if your program hasn't ended so far, it will never, for there's a certain state (a, b) that must have occured twice, by the pigeonhole principle.

Please keep in mind that this is all purely theoretical, and not at all practical. It's just a counterargument to other purely theoretical argument from unsolvability of halting problem.

MostAwesomeDude · on Aug 2, 2011

  def is_collatz(n):
   s = set()
   while n not in (1, 2, 4):
    if n % 2:
     n = 3 * n + 1
    else:
     n /= 2
    if n in s: # Non-Collatz loop
     return False
    s.add(n)
   return True

This is theoretically bounded far below modern memory limits. Feel free to tell me the first natural number for which it returns False. :3

jules · on Aug 3, 2011

Why would that be bounded far below modern memory limits? Why would that be bounded at all?

Secondly, you're making the logic error A => B implies not A => not B. If state spaces as large as modern memory limits allow are too large, that doesn't imply that state spaces far below modern memory limits are small enough to handle.

Thirdly, your program trivially terminates doing nothing.

MostAwesomeDude · on Aug 3, 2011

Troll-feeding time!

Pretend that an OOM condition is the same as a False answer. :3

Here's a program, building on the previous one, which either terminates or doesn't.

  from itertools import count
  for i in count(1):
   if not is_collatz(i):
    break

jules · on Aug 3, 2011

So how is the state space then much smaller than modern memory limits? That still ignores my second point, namely that the logical reasoning that the xyzzyz somehow implied that programs that have state spaces smaller than modern memory limits are in practice analyzeable for halting/not halting, is not sound.

I can tell you now that your code terminates. The set s is strictly increasing in size, inevitably going towards an OOM condition if the loop doesn't terminate by other means. Note that we are talking about determining halting or not halting, not with which value the program halts.

This is not a troll. You tried to respond with something witty. I just pointed out that in this case things like the collatz conjecture don't apply, because it is a conjecture over all natural numbers. When you are working with finite memory, there is no way to encode this conjecture into a program (e.g. by making it halt if the conjecture is true and making it loop when the conjecture is false).

xyzzyz · on Aug 3, 2011

I don't understand your point. I could run it with memory constrained to, say, 100 KB, and it will quickly produce an OOM condition. I clearly said that the approach I sketched in my first comment is not practically applicable.

Real-life computers have bounded memory, so they don't have any more computational power than, say, regular expressions matchers -- every computational task performed by real computer can be represented by appropriately complicated regular expression. That's why one needs to be especially careful when arguing from computability when talking about real-life problems.

The more serious issue is a treatment of I/O.