5.3   Universality

This section under major construction.

One of the crowning scientific achievements of the 20th century was formalizing the notion of computation. In this section we address the fundamental question of what is computable in this Universe. Surprising revelation of 20th century is that a general purpose computer is capable of performing any computation that any other computer can. Perhaps the most important idea in all of computer science.

Disruptive technology which ignited the Computer Revolution. Universality helps explain the rapid adoption of the World Wide Web. People already used computers for number crunching and word processing prior to existence of Web. Universal computer was easily adaptable to dealing with Web protocols, MP3 files, and digital photos. Once technology become available, people could immediately harness its power. (Much slower adoption during Industrial Revolution, etc.) "One could hardly imagine an analogue of this process for the television - it would be as though millions of Americans had been induced to buy large inert boxes for their living rooms, and a decade later someone dreamed up the technology to begin broadcasting pictures to them. But this is more or less what happened with the Web." (Kleinberg-Papadimitriou) We can only imagine what new technologies will emerge in the next few decades, but we can be certain that our universal computer will be capable of taking full advantage of it.

No need for separate machines for processing images and text. One universal machine for all tasks. This is in stark contrast to most other areas. For example, there is no all-purpose universal cooking device. Instead we have separate pieces of equipment for slicing, blending, mixing, baking, boiling, grilling, roasting, toasting, brewing, and radiating. One possible exception in the natural sciences: genome (can change a few symbols in the genome and create a new organism, much like writing a new computer program).

Many different types of computational devices: Cray supercomputer, Dell PC, iMac, Palm Pilot, XBox, Tivo, Turing machine, TOY machine, Java programming language, Microsoft Excel, Java cell phone, quantum Turing machine, Perl programming language. Is there any fundamental difference between what these things can do and what a Gaggia espresso maker can do?

Turing machines are equivalent in power to TOY and Java. Can simulate any Turing machine with a Java program, can simulate TOY with a Turing machine, can simulate, Java with a TOY machine. Same idea works for C, C++, C#, Python, Excel, Outlook. Also Mac, PC, Cray, ENIAC, Konrad Zuse's Z3 (but not proved until 1998), Palm pilot. And TiVo, Xbox, Java cell phone/ But not DFA, Gaggia espresso maker, or this Tinker Toy computer that MIT students built to play Tic-Tac-Toe.

There is an implicit assumption that the TOY machine and the Java programming language have an unbounded amount of memory (extensible storage). Otherwise the TM is strictly more powerful. Is this assumption reasonable? If infinity scares you, you could think of having a service contract. If you need more memory, you just go out and order some more over the Internet. We implicitly think of stack/queue as infinite, even though memory will eventually run out. The Turing machine is a model of computing, not of computers. Thus, we do not restrict the size of the machine's tape.

Java programs are written to handle inputs of any length. You may not be able to execute the program with a large input on a machine with limited memory, but you can execute the program on a machine with a larger memory. Turing machines model programs, rather than machines: the existence of a fast Java program for a given problem is equivalent to the existence of a fast Turing machine for the same problem.

Informally, an algorithm is a step-by-step procedure for solving a problem. Formally, we define an algorithm as a Turing machine. Each Turing machine is capable of performing a single algorithm, e.g., testing whether an integer is prime. In this sense, Turing machines are analogous to computer programs (software) rather than computers (hardware). Knuth: "An algorithm is an abstract concept unrelated to physical laws of the universe." This means that we might need to build a separate Turing machine for each algorithm that we would like to execute. This is highly impractical!

Universal Turing machine.  A Universal Turing machine is a specific Turing machine that can simulate the behavior of any Turing machine (including itself!). This enables a Turing machine to answer questions about other Turing machines (or itself). Alan Turing described such a machine: its existence means that there is a single Turing machine, the UTM, that is capable of running any algorithm. The key idea is to think of the description of a Turing machine itself as data. For example, we can encode the Turing machine illustrated below

Turing machine for equal number of 0's and 1's

with a table as follows. We label each of the five states 0 to 5. We include a row for the index of each state along with its label (left, right, yes, no, or halt). We include a row of four symbols for each transition arrow to denote the current state, input symbol, next state, and write symbol, respectively.

0 L
1 R
2 R
3 Y
4 R
5 N
0 1 # #
2 0 1 x
4 0 0 x
1 2 0 x
1 3 # #
1 4 1 x
2 5 # #
4 5 # #

Now, we can concatenate the table into a single string over the alphabet {0, 1, 2, 3, 4, 5, x, #}.

0 L 1 R 2 R 3 Y 4 R 5 N 0 1 # # 2 0 1 x ... 4 5 # #

and initialize the tape with this input. Since the original Turing machine itself also has an input, e.g., the integer whose primality we want to check, we append this input to the end of the Turing machine description, separating it by a new symbol.

0 L 1 R 2 R 3 Y 4 R 5 N 0 1 # # 2 0 1 x ... 4 5 # # # # 0 0 1 1 1 0 # #

We might also append the start state and the initial position of the tape head.

A universal Turing machine takes as input the description of a Turing machine (along with the initial tape contents of the Turing machine), and simulates the input on that Turing machine. Thus, the UTM can simulate the behavior of any individual TM. In modern terms, the UTM is an interpreter - it performs a step-by-step simulation of any Turing machine. Program and data are the same thing - a program (Turing machine) is nothing more than a sequence of symbols that looks like any other data. When input to a UTM, the program "becomes alive" and begins to compute. (similar for downloading programs from the Web and email viruses - just data until you double-click them and feed them to your operating system for execution).

Building such a UTM appears to be a daunting task. Alan Turing described the first such Turing machine in his 1937 paper. In 1962, Minsky discovered a remarkable 7-state UTM that executes using a four symbol alphabet, but it is rather complicated to describe.

General purpose computers.   First articulated by Ada Lovelace. She described Babbage's Analytic Engine as suited for "developing and tabulating any function whatever... the engine [is] the material expression of any indefinite function of any degree of generality and complexity." She described its use for to scientific computing, including trigonometric functions and Bernoulli numbers. She also championed the idea that it could be used to produce music and graphics. General purpose computers are analogous to universal Turing machines: they are capable of running different algorithms without requiring any hardware modifications. This is possible because modern day microprocessors are based upon the von Neumann architecture. In this model, computer programs and data are stored in the same main memory. This means that the contents of memory can be treated as a machine instruction or data, depending on the context. This is exactly analogous to the tape contents of the UTM which consists of both a program (the original TM) and its data (the tape contents of the original TM). Alan Turing's work on UTM anticipated the development of general purpose computers, and can be viewed as the invention of software!

Church-Turing thesis.

In 1936, Alonzo Church and Alan Turing independently proposed models of computation that they believed embodied the notion of computation by a mechanic procedure. Church invented the lambda calculus to study notions of computability while Turing used his Turing machines. Although both models appear very different from one another, Turing later showed that were equivalent in that they each pick out the same set of mathematical functions. They arrived at the same conclusion, which we express in terms of Turing machines:
Turing machines can do anything that can be described by a purely mechanical process.

In other words, any process that can be performed by an idealized mathematician can be simulated on a Turing machine. This thesis was subsequently extended to assert that all (physically harnessable) processes of the universe can be simulated by a Turing machine, and this is the version we will consider as the Church-Turing thesis. This reduces the study of computation to the study of Turing machines, rather than an infinite number of potential computing devices. It means that we don't have to seek more powerful machines; our only recourse is to exploit the power of existing machines or create new machines that do things faster. The converse of the thesis also has profound consequences. It says that if something can't be done on a Turing machine, then we can't do it using any purely mechanical process. We will explore the consequences in Section 7.6 when we discover that there are some problems that Turing machines can't solve. Note that the Church-Turing thesis is not a mathematical statement, and is not subject to rigorous proof. It is a statement about the physically realizable universe. However, like any good scientific theory, the Church-Turing thesis is subject to refutation. If someone exhibited a strictly more powerful model of computation that was physically harnessable, we would dismiss the Church-Turing thesis.

The thesis is equivalent to saying that any general-purpose programming language is sufficient to express any algorithm.

Universality.  The Church-Turing thesis implies a universality among different models of computation. There is ample support suggesting this universality. Adding new capabilities to a Turing machine does not make it more powerful in terms of the languages it can accept or the functions it can compute.

Modified Turing Machine Description
Multiple heads Two or more independent tape heads
Multiple tapes Two or more tapes
Multidimensional tape Two dimensional tape
Nondeterministic NFA controls the tape instead of DFA
Probabilistic Can flip fair coins. Accept input if most of
coin flips lead to accept state
Editable Can insert and delete symbols on the tape

The definition of a Turing machine is very robust. The following restrictions do not affect a Turing machine's power (when applied individually to a standard Turing machine).

Modified Turing Machine Description
One-way infinite Tape is unbounded in one direction only
Binary Tape alphabet consists of only two symbols
Two state DFA that controls the tape has two states
Non-erasing Turing machine which can never re-write a symbol
once it's written to the tape
Time-reversible Previous state can always be uniquely determined
from the current state and tape contents.

Mathematicians, computer scientists, biologists, and physicists have considered numerous other models of computation. The following is a partial list of such models that have all been shown to be equivalent in power to the Turing machine, thereby providing additional support for the Church-Turing thesis.

Universal Model of Computation Description
Post Formal Systems
Emil Post, 1920s
String replacement rules designed to prove mathematical statements from a set of axioms.
Untyped Lambda Calculus
Alonzo Church, 1936
A method to define and manipulate functions. Basis of functional programming languages, including Lisp and ML
Universal Turing Machines
Alan Turing, 1936
A Turing machine that can simulate the behavior of any other Turing machine.
General Recursive Functions
Herbrand, 1932
Kurt Godel, 1934
Functions dealing with computation on the natural numbers.
Partial Recursive Functions
Alonzo Church, 1932
Stephen Kleene, 1935
Functions dealing with computation on the natural numbers.
Markov Algorithms
Andrei Markov, 1960
Apply string replacement rules sequentially in a prespecified order.
Unrestricted Grammars
Noam Chomsky, 1950s
Apply string replacement rules sequentially in arbitrary order until a certain stopping condition is satisfied. Used by linguists to describe natural languages.
Tag Systems
Emil Post, 1921, 1935, 1965
As long as the string consists of at least k letters: read the first letter, delete the first k letters, and append a string to the end depending on the first letter.
Extended L-Systems
Aristid Lindenmayer, 1976
Apply string replacement rules in parallel. Biologists use to model the growth of plants.
Semi-Thue Systems
Axel Thue, 1910
Apply string replacement rules in arbitrary order.
Horn Clause Logic
Alfred Horn, 1951
Logic based system for theorem proving. Forms basis of Prolog language.
1D Cellular Automata
Matthew Cook, 1983
Stephen Wolfram, 2002
A one dimensional boolean array of cells whose values change according to the state of adjacent cells. Corresponds to finite impulse response digital filters.
2D Cellular Automata
John von Neumann, 1952
John Conway, 1960s
A two dimensional boolean array of cells whose values change according to the state of adjacent cells. The most famous example is Conway's Game of Life.
Post Machines
Emil Post, 1936
A DFA plus a queue.
Two Stack Machines A DFA plus two stacks.
Two Register Machines
Shepardson-Sturgis, 1963
Marvin Minsky, 1961
A DFA plus two integer counters which it can increment, decrement, and test against zero.
Programming Languages Java, C, C++, Perl, Python, PHP, Lisp, PostScript, Excel, ...
Random Access Machines Finitely many registers plus memory that can be accessed with an integer address. Includes TOY and virtually all modern day microprocessors.
Pointer Machines Finitely many registers plus memory that can be accessed as a linked list.
Quantum Turing Machines
Richard Feynman, 1965
David Deutsch, 1985
Computes using the superposition of quantum states.
Billiard Ball Computer
Fredkin-Toffoli, 1982
Indistinguishable billiard balls move in the plane, making elastic collisions with each other and internal barriers.
Particle Machines Information is carried through space by particles, and computation occurs when particles collide.
Filter Automata
Park-Steiglitz-Thurston, 1985
Cellular automata that use newly computed values as soon as they are available. Corresponds to infinite impulse response digital filters.
Generalized Shift Maps
Christopher Moore, 1990
A single classical particle moving in a 3D potential well made of parabolic mirrors.
DNA Computers
Len Adleman, 1994
Compute using biological operations on strands of DNA.
Shift-like Dynamical Systems
Christopher Moore, 1981
Dynamics based computing using chaos theory.
Dynamical Systems
Sinha-Ditto, 1998
Dynamics based computing using chaos theory.
Dynamical Systems Hybrid systems, piecewise affine systems, saturated linear systems.
Soliton Collision Systems
Ken Steiglitz, 2000
Time-gated Manakov spatial solitons in a homogeneous medium. A gateless computer without spatially fixed gates.
High-Level Petri Nets
Carl Petri, 1962
Generalization of automata for concurrently occurring events. Applications to management, manufacturing, communication protocols, fault tolerant systems.

Interactive computation. Perhaps the most natural form of computation is as a calculator - transforming an input into an output. A Turing machines does not accept external input when it computes; this prevents it from directly modeling many natural processes. Our computers perform many operations that are not as natural to put into this framework. For example, your operating system, user interfaces, word processor, video game, soccer-playing robots, networks, sensors. These all involve interactions or communications with the program by an external agent (a person or another program). Interaction machines are natural generalizations of Turing machines that accept synchronous or asynchronous input streams.

Why doesn't this violate Church-Turing thesis? One could argue that the external agents themselves could be simulated by a Turing machine!

Physics of computation. In their landmark paper Conservative Logic, Fredkin and Toffoli argue that computation is a physical process and is ultimately governed by physical principles. They propose a number of axioms to which all computational process are subjected. Turing's goal was to invent a machine that captured the essence of computation, while still being subject to the laws of physics. Fredkin and Toffoli discuss how Turing machines implicitly embody these first three axioms.

Deutsch proposed the Church-Turing Principle that asserts that the Universe is capable of containing a universal machine that can simulate the universe itself. This connects the notions of computability with physics.

Zuse-Fredkin thesis. Around 1960 Edward Fredkin first posited the radical notion that the Universe is a (universal) cellular automata, a kind of highly parallel computing device. In the late 1960, Konrad Zuse independently proposed that the Universe is being computed on a computer, possibly a cellular automata. He referred to this as Rechnender Raum or Computing Cosmos.. This controversial thesis became know as the Zuse-Fredkin thesis. From a physics viewpoint, it is a universal Theory of Everything. It is certainly true that many complex systems can be explained by simple discrete processes. It remains a fascinating open scientific question as to whether it is cellular automata or some other process that is responsible for such complexity in nature. Although the Church-Turing thesis has been widely accepted, the more far-reaching Zuse-Fredkin thesis has remained in relatively obscurity and has not yet been widely accepted. One of the main challenges is reconciling it with quantum physics. Also, it is hard to verify, and has not yet proved useful in making accurate predictions about the universe that could not be made using traditional methods. The Konrad-Zuse thesis is the cornerstone of two emerging disciplines - Digital Mechanics and Digital Physics. It has deep philosophical ramifications. For example, if the Universe is a CA, then we humans, together with our logical thoughts, are algorithms executing in some space-time region of the CA. Following on the work of Zuse, Fredkin, and many others, in his book A New Kind of Science, Steve Wolfram promotes the theory that the entire universe is a massive computational process, and that such physical processes are best understood by investigating simple model of computation like cellular automata.

Digital physics: universe is the output of a deterministic computer.

Universal limits on computation. The Hawking-Bekenstein bound postulates that there is a limit to the amount of information (number of quantum states or number of bits that can be encoded) I that a sphere of radius R with a given energy E can hold: I &le 2 π E R / (ℏ c ln 2), where ℏ = Plank's constant and c is the speed of light. Alternatively, using E = mc2, &le k M R, where M = mass of region and k is approximately 2.57686 × 1043 bits / (m kg). The maximum processing speed is bounded by the minimum time for a state transition which is bounded by the time for light to cross the region of radius R = π E / (ℏ ln 2) bits / sec. The Bekenstein bound is based on black hole physics. This implies, that in theory, Turing machines, with their arbitrary size tape, only exist in our imagination (if they are to have finite size and bounded energy). Though in practice, they serve is an excellent model of reality.

Q. Suppose a human brain stores 1016 bits, has a mass of 1kg, and is approximately spherical with a radius of 10cm. Compare its efficiency to the Bekenstein bound.

This tighter bound is based on This physics paper by Krauss and Starkmann suggests that if the Universe is accelerating, then there is an inherent limit on the total amount of information that can be processed, even infinitely far into the future. The estimated limit is approximately 10120 bits, so there is no need for immediate concern! This Nature paper by Seth Lloyd puts quantitative bounds on the rate at which a 1kg computer occupying a volume of 1 liter can compute and how much memory it can access.

Beyond the Church-Turing thesis. Are there any types of computation that cannot be done by a Turing machine, but that can be done using some other kind of physical or abstract machine? Yes. We consider a few representative examples of so called hyper-computation.

These models of computation are capable of solving problems (e.g., the halting problem) that cannot be solved using Turing machines. These are not known to violate the Church-Turing thesis because we do not know how to build such machines.

These models of computation are equivalent to the Turing machine in terms of computability, e.g., neither can solve the halting problem. However, these models can solve problems that falls outside the spectrum of computing functions and deciding languages, e.g., generating truly random numbers.

Levin's Tale of One-way Functions is a counterpoint to hyper-computation.


  1. Conway's game of life. Universality. Here's Martin Gardner's article from Scientific American in 1970.

  2. Cellular automata. Cellular automata are computer simulations that try to emulate the laws of nature. They are used to model biological, chemical, and physical processes including: ferromagnetism according to Ising mode, forest fire propagation, nonlinear chemical reaction-diffusion systems, turbulent flow, biological pigmentation patterns, breaking of materials, growth of crystals, and growth of plants and animals. They are also used in image processing, computer graphics, design of massively parallel hardware, cryptography, and art.

    A one-dimensional cellular automaton is an array of cells that evolves over time according to a set of rules based on the state of adjacent cells. At a given time t, each cell is either alive or dead. Rule 90 says that cell i is alive at time t if exactly one of its neighbors (cell i-1 or cell i+1) is alive at time t-1. Program Cellular.java simulates the behavior of this cellular automaton using two Boolean arrays. Array element cells[i] is true if cell i is alive in the current time t, and false otherwise; array element old[i] is true if cell i is alive in the previous time step t-1, and false otherwise.

  3. Cellular automata. Here's a nice review of cellular automata. Here's the Cellular Automata FAQ. Give 1d universal automaton.
  4. Firing squad synchronization problem. The firing squad problem was first proposed by Mynhill in 1957 and solve by Moore in 1962 in the context of finite state machines. You have a 1D cellular automata with 2n identical cells, each which can take on one of a finite number of colors. They all run at the same speed, and each can only communicate with its immediate neighbors. All cells are initially white. Design the machines so that after a "start" signal is given to the first machine (the general), all machines (the soldiers) turn a special "firing" color for the first time at time t. You don't know the size of the array ahead of time, so your solution must not depend on n. Hint: try to find the middle cell.
  5. Zebra stripes. Produce a picture of synthetic zebra stripes by simulating a 2D cellular automaton using turtle graphics.

    1. Start by creating an N-by-N grid of cells, initializing each to black or white at random. Then, run 10 phases, where each phase iterates through the N-by-N cells one at a time, updating the color (black or white) of cell (i, j) as follows:
      1. For each neighboring cell (i', j') that is black such that |i - i'| = 0 or 1 and |j - j'| <= 3, add 2.0 to a running total.
      2. For each neighboring cell (i', j') that is black such that |i - i'| = 2 or 3 and |j - j'| <= 3, subtract 1.2 from the running total.
      3. If the running total exceeds 1.4, color (i, j) black; otherwise color it white.
      Write a program Zebra.java to illustrate the results of the cellular automaton.
    2. Experiment with a different weighting function (e.g., 2.0 and -0.4) and use the Manhattan distance function |i - i'| + |j - j'| instead of the one above.
  6. Paterson worms. Paterson worms are a recreation puzzle discovered by John Conway and popularized by Marvin Gardner, much like Conway's Game of Life. Write a program PatersonWorm.java that animates Patterson worms in Turtle graphics.
  7. Lindenmayer systems. Apply the replacement rule F -> F+F--F+F to the initial string F+F--F+F in parallel. For example, after one application the string becomes F+F--F+F--F+F--F+F--F+F--F+F. If you interpret F as go 1 step forward, + as rotate 60 degrees clockwise and - as rotate 60 degrees counterclockwise, this sequence of commands draw the Koch snowflake. Write a program Lindenmayer.java that takes a command line argument N and prints the commands for generating the Koch curve of order N. Hint: use the string library method replaceAll.
  8. Lindenmayer system to generate trees. Write a program LSystem.java that creates tree-like graphics using the Lindenmayer system: XYZ.
  9. Lindenmayer system interpreter. Write a program that reads in the description of a Lindenmayer system and plots the resulting pattern.
  10. Ackermann's function. The Ackermann function A(m, n) is a deceptively simple function that plays a crucial role in the analysis of algorithms and complexity theory. It's defined recursively as follows:
    A(0, n) = n + 1
    A(m, 0) = A(m - 1, 1)
    A(m, n) = A(m - 1, A(m, n - 1))
    Write a program Ackermann.java that takes two command line parameters M and N and computes A(M, N).
    1. What happens when you compute A(3, 9)? Answer: stack overflow error.
    2. What is A(5, 5)? Answer: too many digits to ever print out. Here is A(4, 2). To see how devastatingly fast this function grows, consider the following equivalent definition:
      A(0, n) = n + 1
      A(1, n) = 2 + (n + 3) - 3
      A(2, n) = 2 × (n + 3) - 3
      A(3, n) = 2 n + 3 - 3
      A(4, n) = 22...2 - 3        (n + 3 terms in the tower)
  11. Buck's function. Buck's function is defined recursively as:
    f(m, n) = f(m-1, f(m, n-1))
    f(0, n) = n + 1
    f(1, 0) = 2
    f(2, 0) = 0
    f(m, 0) = 1 for m = 3, 4, ...

    Calculate by hand f(1, n), f(2, n), f(3, n), and f(4, n) as a function of n.

    Answer: 2 + n, 2n, 2n, 222...2 (n terms in the tower).

  12. Cellular automata sand pile. Model sand falling into a pile using a cellular automata model.
  13. Schelling segregation model. Investigate the Schelling Segregation Model (SSM). "Among first constructive model of a dynamic system capable of self-organization." Modeled an integrated environment where each agent has a slight preference for one's neighbors to be of the same kind could lead to segregation. For every colored cell, if greater than 1/3 (or some other threshold) of neighbors are a different color, move to a randomly selected cell (or have each cell take a random walk). (or perhaps examine all cells within some neighborhood, with weight proportional to 1/d). Schelling won the 2005 Nobel Prize in Economics. Green turtles and red turtles. Preferences ripple through a pond. Even if turtles start out wanting only 30% similarity, they end up with around 70%. Here's a demo.