7.2: The Halting Problem

Last updated
Save as PDF

Page ID: 48327

Eric Lehman, F. Thomson Leighton, & Alberty R. Meyer
Google and Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Although towers of larger and larger infinite sets are at best a romantic concern for a computer scientist, the reasoning that leads to these conclusions plays a critical role in the theory of computation. Diagonal arguments are used to show that lots of problems can’t be solved by computation, and there is no getting around it.

This story begins with a reminder that having procedures operate on programs is a basic part of computer science technology. For example, compilation refers to taking any given program text written in some “high level” programming language like Java, C++, Python, . . . , and then generating a program of low-level instructions that does the same thing but is targeted to run well on available hardware. Similarly, interpreters or virtual machines are procedures that take a program text designed to be run on one kind of computer and simulate it on another kind of computer. Routine features of compilers involve “type-checking” programs to ensure that certain kinds of run-time errors won’t happen, and “optimizing” the generated programs so they run faster or use less memory.

The fundamental thing that just can’t be done by computation is a perfect job of type-checking, optimizing, or any kind of analysis of the overall run time behavior of programs. In this section, we’ll illustrate this with a basic example known as the Halting Problem. The general Halting Problem for some programming language is, given an arbitrary program, to determine whether the program will run forever if it is not interrupted. If the program does not run forever, it is said to halt. Real programs may halt in many ways, for example, by returning some final value, aborting with some kind of error, or by awaiting user input. But it’s easy to detect when any given program will halt: just run it on a virtual machine and wait till it stops. The problem comes when the given program does not halt—you may wind up waiting indefinitely without realizing that the wait is fruitless. So how could you detect that the program does not halt? We will use a diagonal argument to prove that if an analysis program tries to recognize the non-halting programs, it is bound to give wrong answers, or no answers, for an infinite number of the programs it is supposed to be able to analyze!

To be precise about this, let’s call a programming procedure—written in your favorite programming language—a string procedure when it is applicable to strings over a standard alphabet—say, the 256 character \(\text{ASCII}\) alphabet. As a simple example, you might think about how to write a string procedure that halts precisely when it is applied to a double letter ASCII string, namely, a string in which every character occurs twice in a row. For example, \(\mathrm{aaCC33}\), and \(\mathrm{zz++ccBB}\) are double letter strings, but \(\mathrm{aa;bb}\), \(\mathrm{b33}\), and \(\mathrm{AAAAA}\) are not.

We’ll call a set of strings recognizable if there is a string procedure that halts when it is applied to any string in that set and does not halt when applied to any string not in the set. For example, we’ve just agreed that the set of double letter strings is recognizable.

Let \(\text{ASCII}^*\) be the set of (finite) strings of \(\text{ASCII}\) characters. There is no harm in assuming that every program can be written using only the \(\text{ASCII}\) characters; they usually are. When a string \(s \in \text{ASCII}^*\) is actually the \(\text{ASCII}\) description of some string procedure, we’ll refer to that string procedure as \(P_s\). You can think of \(P_s\) as the result of compiling \(s\). ² It’s technically helpful to treat every \(\text{ASCII}\) string as a program for a string procedure. So when a string \(s \in \text{ASCII}^*\) doesn’t parse as a proper string procedure, we’ll define \(P_s\) to be some default string procedure—say one that never halts on any input.

Focusing just on string procedures, the general Halting Problem is to decide, given strings \(s\) and \(t\), whether or not the procedure \(P_s\) halts when applied to \(t\). We’ll show that the general problem can’t be solved by showing that a special case can’t be solved, namely, whether or not \(P_s\) applied to \(s\) halts. So, let’s define

Definition \(\PageIndex{1}\)

\[\label{7.2.1} \text{No-halt } ::= \{s \in \text{ASCII}^* \mid P_s \text{ applied to } s \text{ does not halt}\}. \]

We’re going to prove

Theorem \(\PageIndex{2}\)

No-halt is not recognizable.

We’ll use an argument just like Cantor’s in the proof of Theorem 7.1.11.

Proof

For any string \(s \in \text{ASCII}^*\), let \(f(s)\) be the set of strings recognized by \(P_s\):

\[\nonumber f(s) ::= \{t \in \text{ASCII}^* \mid P_s \text{ halts when applied to } t\}.\]

By convention, we associated a string procedure, \(P_s\), with every string, \(s \in \text{ASCII}^*\), which makes \(f\) a total function, and by definition,

\[\label{7.2.2} s \in \text{ No-halt IFF } s \notin f(s),\]

for all strings, \(s \in \text{ASCII}^*.\)

Now suppose to the contrary that No-halt was recognizable. This means there is some procedure \(P_{s0}\) that recognizes No-halt, which is the same as saying that

\[\nonumber \text{No-halt } = f(s_0).\]

Combined with (\ref{7.2.2}), we get

\[\label{7.2.3} s \in f(s_0) \text{ iff } s \notin f(s) \]

for all \(s \in \text{ASCII}^*\). Now letting \(s = s_0\) in (\ref{7.2.3}) yields the immediate contradiction

\[\nonumber s_0 \in f(s_0) \text{ iff } s_0 \notin f(s_0)\]

This contradiction implies that \(\text{No-halt}\) cannot be recognized by any string procedure.\(\quad \blacksquare\)

So that does it: it’s logically impossible for programs in any particular language to solve just this special case of the general Halting Problem for programs in that language. And having proved that it’s impossible to have a procedure that figures out whether an arbitrary program halts, it’s easy to show that it’s impossible to have a procedure that is a perfect recognizer for any overall run time property.³

For example, most compilers do “static” type-checking at compile time to ensure that programs won’t make run-time type errors. A program that type-checks is guaranteed not to cause a run-time type-error. But since it’s impossible to recognize perfectly when programs won’t cause type-errors, it follows that the type-checker must be rejecting programs that really wouldn’t cause a type-error. The conclusion is that no type-checker is perfect—you can always do better!

It’s a different story if we think about the practical possibility of writing programming analyzers. The fact that it’s logically impossible to analyze perfectly arbitrary programs does not mean that you can’t do a very good job analyzing interesting programs that come up in practice. In fact, these “interesting” programs are commonly intended to be analyzable in order to confirm that they do what they’re supposed to do.

In the end, it’s not clear how much of a hurdle this theoretical limitation implies in practice. But the theory does provide some perspective on claims about general analysis methods for programs. The theory tells us that people who make such claims either

are exaggerating the power (if any) of their methods, perhaps to make a sale or get a grant, or
are trying to keep things simple by not going into technical limitations they’re aware of, or
perhaps most commonly, are so excited about some useful practical successes of their methods that they haven’t bothered to think about the limitations which must be there.

So from now on, if you hear people making claims about having general program analysis/verification/optimization methods, you’ll know they can’t be telling the whole story.

One more important point: there’s no hope of getting around this by switching programming languages. Our proof covered programs written in some given programming language like Java, for example, and concluded that no Java program can perfectly analyze all Java programs. Could there be a C++ analysis procedure that successfully takes on all Java programs? After all, C++ does allow more intimate manipulation of computer memory than Java does. But there is no loophole here: it’s possible to write a virtual machine for C++ in Java, so if there were a C++ procedure that analyzed Java programs, the Java virtual machine would be able to do it too, and that’s impossible. These logical limitations on the power of computation apply no matter what kinds of programs or computers you use.

²The string, \(s \in \text{ASCII}^*\), and the procedure, \(P_s\), have to be distinguished to avoid a type error: you can’t apply a string to string. For example, let \(s\) be the string that you wrote as your program to recognize the double letter strings. Applying \(s\) to a string argument, say \(\mathrm{aabbccdd}\), should throw a type exception; what you need to do is compile \(s\) to the procedure \(P_s\) and then apply \(P_s\) to \(\mathrm{aabbccdd}\).

³The weasel word “overall” creeps in here to rule out some run time properties that are easy to recognize because they depend only on part of the run time behavior. For example, the set of programs that halt after executing at most 100 instructions is recognizable.