Some questions can't be answered with a simple yes or no (one bit in Turing machine terms). Don't have a wife, or don't beat her - sorry, just answer yes or no. Turing's genius was to make the proof sufficiently prolix that the trick was not obvious. Allow the machine its two bits worth, the proof fails. And he was a genius, just not this time.
In the early 1900's a famous German mathematician posed the problem - Is it possible to prove everything automatically - that is, with a machine? This has the subtext that mathematicians are dispensable.
Turing and many others responded to this challenge - needless to say, they wanted to prove it was not possible. In fairness, proving it was possible would be impossible.
Turing devised a machine which used numbers as its instructions - it had a notional tape, on which it could read and write numbers. He showed that such a machine would have the capabilities of a human computer - by that he meant that it could compute anything that a person who spent his life making computations - say for stresses in aircraft - could. Each machine had a characteristic number, made up of the string of numbers that represented its instructions.
He then showed that a particular problem arose if a machine could determine whether other machines would halt (not loop forever) in carrying out their calculations. The problem was that a machine which could determine the halting behaviour of others would run into an inconsistency when operating on itself. Hey presto, it was not possible to prove everything.
He subsequently showed that such a machine, of sufficient complexity or large enough characteristic number - the Universal Turing Machine - could read in another characteristic number, and operate as though it was the machine represented by that number.
What is wrong with the halting proof? Let's start at the beginning.
Generality
To prove a machine cannot prove everything, one must conceive of the most general and powerful machine that is possible, and prove it must fail. Turing did not attempt to conceive of this - instead he used the human as the most general machine that is possible. In doing so, he implicitly asserted that a single human is at least as capable as any number of humans working cooperatively together to prove something. Group Theory is one example where no single human synthesised a proof - instead a proof came from many hands. His concept of the most general machine is inadequate - even if his proof were successful within itself, it failed the test of generality of the machine.
Accuracy of Model
A good model will often display characteristics of the thing it is modelling that were not conceived of when the model was proposed. Infinite looping is a problem for Turing machines. Is it a problem for the human computer? No, he wears a watch and carries a stomach, either of which tells him when it is time to go home for dinner - there is a parallel process which is capable of interrupting the trapped process. If the watch doesn't work, then the stomach or the aging process will. While seemingly extraneous detail, the interruption allows the human to observe how much time has already been spent on the problem - new information that is not available to a machine stuck in an endless loop. Hardly surprisingly, computers use timers and interrupts to allow them to escape from infinite loops. We are getting closer to the real problem. Just as several humans cooperating cannot be modelled simply, so a human with a watch cannot be modelled with the mathematics that Turing had available to him, because it involves two independent processes that cannot be described by a single number, and the single stable number is the essence of the proof.
Sequence Machine versus State Machine
Another problem that Turing introduced is the notion of sequence. His machine carries out instructions in a sequence, whereas the human computer has a state machine. If a state changes in the state machine, everything that depends on that state is advised of the change, because everything that depends on that state is connected to it. This is not true of the stream of sequential instructions - the machine makes a decision and moves on. Mathematical proofs may look like a sequence of steps, but that is only because they are an instantaneous trace through a state structure in someone's head. A sequential machine does not have the validity of the state machine - it has no theoretical justification for its operation in anything but a static environment, where its operations do not change the environment in any way - but its operations obviously change its environment. So it is easy to show that the Turing machine is an inappropriate metaphor for a general purpose theorem prover.
Constancy of Number
The proof relies on the premise that the characteristic number for the machine is constant, and an inconsistency will occur if it attempts a proof on itself. If we take the human computer, we are saying that this person, over their thirty year working life, will never become more facile in their operation, will never learn a shortcut, will never remember a previous calculation, but will carry out their calculations in exactly the same way on the last day as the first - their operation will not affect their method of working. This seems hard to swallow. If the person changes their behaviour in any way, then Turing's analogy breaks down - he has not accurately represented the ability of the human with his machine.
More crushingly, Turing himself showed why the proof was nonsense. If a machine can read in another characteristic number and "execute" it, then the Universal Halting Prover (UHP) can read in the number to be proven while retaining its own number (the operation is nothing like add, so imagine the two numbers are separated by a decimal point, and both are recoverable). It would then carry out calculations like
UHP.1
UHP.2
UHP.2134532
UHP.UHP
and so on. Its number is different each time it has proved something, so no inconsistency ever occurs. How else is the machine to "prove" something about another machine, except by, at least, executing some of its code and observing the result. The instruction sequences are generally not in its own instruction sequences, so its number must have changed if it can execute even the smallest piece of the code of the target machine. Let's take a simple example:
The UHP has number 999999
The machine to be operated on has 333
How does the UHP know what instruction 3 does? Does it look it up? If it could look it up, it would solve it symbolically, and never get into a loop which may not halt in the first place. So it must execute the instruction, so it must be in its characteristic number, so its number cannot be stable.
One could imagine the UHP is sufficiently complex to create another virtual machine which, say, has the number 9999993, but then this virtual machine has to be the target of the proof, and its number is not stable from one proof to the next.
The claimed inconsistency is silly anyway, as there are any number of ways around it, even if you can't accept that the machine changes its number when operating on a program it is trying to prove. Try putting the nub of the proof - "stop if you don't stop" - in everyday terms. It would be like commanding someone in Sydney to - "Run on the spot, and tell me when you reach Melbourne". The reply would quickly indicate that such a silly command would not be accepted. If the machine is allowed to have more than two indicator states - running and stopped (an error state will do) - the halting problem disappears.
Confusing Numbers with Objects
Turing had sought to identify machines with numbers, and then use the properties of numbers to prove what he wanted. But his machines are objects, not numbers. There is only one number - two - but there can be an infinity of Turing machines labelled Two, each in different states.
One could imagine a series of cranes, each identified by the weight they can lift. Then a Number 10 crane can lift a Number 2 crane (because the Number 2 crane weighs less than the load limit of the Number 10, but we don't know that so we have to try each one). But can a Number 32 crane lift a Number 32 crane - of course it cannot lift itself, it cannot be both the lifter and the liftee. But it may well be able to lift another copy of itself, another object, or perhaps several Number 10s can lift a Number 32. Again we come back to the necessity for a single process represented by a number and a processor of the instructions that make up that number - several of these in parallel are beyond the bounds of the proof.
Let us have two Turing machines - we now have four potential states. We nominate the stopping of machine B as the signal that a result has been obtained, and the state of machine A signalling the result - machine A is stopped means the machine will halt, machine A still running means that the machine does not halt. Machine A may often halt during the test, but only when machine B halts is it relevant. We would never have made this simple logical mistake if asked whether a crane can lift a crane of that type - why have so many sophisticated people made such an elementary logical error in thinking about a Turing machine?
Better Indication
The nub of the proof is that the device must not halt to indicate that the machine would halt on its own number. We could give the machine a green light and a red light, so when it stopped it could indicate two states. Let us change the requirement so that the machine must run through the infinity of numbers, writing out a zero or a one to indicate halting or non-halting. The problem has gone away. Because there was never a problem there, only a shabby trick. The trick is even shabbier, as it is a means for showing the human's superiority over a machine, pulling a cheap trick that no self-respecting logical machine would be seen dead with. How did three generations of people accept it, or did they just accept the conclusion and ignore the proof. The Turing proof is one pillar of Popper's thought. If he couldn't see through it in thirty seconds, how much credence should we give to his reasoning?
Infinitely Many Universal Halting Provers
Still feel the characteristic number is stable and immutable for a particular machine and crave a "mathematical" proof?
We have found the fabled UHP and can write down its number. It can prove halting about a machine with every other number but itself. Its code is quite simple - it executes the machine instructions for infinity -1, whereupon it is halted. If it needed to be halted, the particular characteristic number does not halt. The code for a No Instruction (NOP) is 1. We add 1 to the UHP and gain another UHP, we will call it UHP+1. We are confident the NOP did not change its ability to solve halting problems. But it can now solve for UHP. In fact, by changing the order of its setup instructions, or by sprinkling NOPs throughout its workings, we can create an infinity of UHPs, each of which can solve any of the others. What's the problem? Well, we have no way of seeing it has changed its number (because we deliberately chose not to see), so we assume it hasn't, because that suits our purpose. If it had a display and showed it had changed its number, the limit of two states, stopped and running, would be broken.
Proofs to order are always suspect, but why did anyone ever believe this one. Turing never had the luxury of watching his computer spin in a word processing program and change its behaviour, so a little naivety as to what he had invented is forgivable, but what is our excuse. What do we really think is happening, or don't we think about it?
Several books (Penrose's for example) give Turing's proof in one chapter and describe the Universal Turing Machine in the next. It would require an attention span of no more than several minutes to realise that the proof of a few pages back just got invalidated. Why doesn't this happen? Penrose goes on to say that the proof is not true in any particular case, because we can always work out whether a particular machine will halt or not, but is happy to accept it in general. Proofs would be unusual (and useless) if this were generally the case.
Unfortunately, a story that bears out one's prejudices is all too readily accepted, and the more illogical, the more easily accepted. Here, the prejudice is "We are indispensable". The proof seems to say that machines are stupid, so the proof must be true. Turing meant it to apply to all machines, including human mental apparatus, but that is easily overlooked.
When it was suggested to one mathematician, a reviewer of a potential paper on the subject, that doubt might be cast on Turing's proof, his response was
"All mathematicians believe it to be true, therefore it is true"
neatly illustrating the use of a logical fallacy to support an illogical belief.
There are some mathematicians, following Pythagoras, who believe that mathematics should have its virgin births and other mysteries, to keep out unbelievers. Less sophisticated people, who use mathematics but do not regard it as a religion, would prefer that mathematics remained a realm of the rational (or at least that the irrational was restricted to numbers, and didn't refer to concepts).
But there are other ways of proving the same thing - Godel numbers for example. Godel embedded a premise in his work - that arithmetic is functional - that allowed the conclusion. Remove the unwarranted premise, and the proof fails. And Godel was intent on proving something else - that Unknowable is a necessary state, even in axiomatic systems relying only on the two truth states. We certainly wouldn't disagree - Unknowable is a valid state in our system too. If Godel's notion that three states are necessary is accepted, then it becomes obvious that "running" and "stopped" aren't enough to cover all the necesary states of a Turing machine theorem prover - so Godel actually proved that Turing's proof is invalid. Put another way, when Turing showed that a machine which could only communicate with two states could not prove everything, he should have stopped and thought whether the restriction to two states was the problem, or the machine's inability was the problem. As this was a proof to order, he didn't bother with the alternative.
Self Reference
The UTP has problems operating on itself - let's turn that into something everybody will understand. A surgeon can operate succesfully on everyone else, but with restictions upon himself or herself. Obviously a general anaesthetic is out of the question, but a strong enough local anaesthetic may do the trick if no other surgeon is available (note how obvious the answer is in a non-mathematical example). Similarly, if the UTP has to operate on itself, it applies a local anaesthetic, a NOP instruction, either to its own number or the patient, so it can't be tricked into confusing itself with the machine being analysed, This is all obvious stuff.
The final argument - Species Arrogance.
When Mrs Monkey gave birth to the first human, she may have thought "Poor little thing - no decent fur, and those feet - no good for climbing trees, it will be eaten by the first cat that finds it". Still, the birth had hurt like hell because of its big head so she was already a little invested, and the little face was pitiable, so she would look after it for now. She couldn't know that, because it could communicate with its own kind better than she could with hers, that one day we would stand on the shoulders of Plato, Newton, Darwin. Asked to judge its future based on it starting from scratch, and with a nasty, brutish and short life, its prospects would have looked dim, but because it can function in a community, it can achieve all sorts of things over time. Today, are we any better equipped than Mrs Monkey was to prognosticate on what machines might achieve - we already know they can communicate with each other far faster than we can.
There is a saying - "Two heads are better than one", and if this is true, six billion heads are probably better than two. Plato was a great guy, but he couldn't fix your car, while someone else can, allowing you to function on a higher conceptual level. If we want to prove what machines can't do, should we base the proof on one rather limited one (if we allow two, the proof doesn't work).
Why bother with Turing's proof - it is clear that most people in the computer area view the halting proof as of no relevance and think it provides no insuperable boundary to their work. We are in the business of developing software which changes its structure as part of its operation - we are chasing the human capabilities that Turing chose to ignore, so that machines can be a little less stupid. We should have a clear understanding as to exactly why the proof should not be relevant to our work, and of the sad mindset of the people who desire it to be true.
Amazing!