# 7.3 Finite State Automata

This section under major construction.

Abstract machines. Modern computers are capable of performing a wide variety of computations. An abstract machine reads in an input string, and, depending on the input, outputs true (accept), outputs false (reject), or gets stuck in an infinite loop and outputs nothing. We say that a machine recognizes a particular language, if it outputs true for any input string in the language and false otherwise. The artificial restriction to such decision problems is purely for notational convenience. Virtually all computational problems can be recast as language recognition problems. For example, to determine whether an integer 97 is prime, we can ask whether 97 is in the language consisting of all primes {2, 3, 5, 7, 13, ... } or to determine the decimal expansion of the mathematical constant π we can ask whether 7 is the 100th digit of π and so on.

We would like to be able to formally compare different classes of abstract machines in order to address questions like Is a Mac more powerful than a PC? Can Java do more things than C++? To accomplish this, we define a notion of power. We say that machine A is at least as powerful as machine B if machine A can be "programmed'" to recognize all of the languages that B can. Machine A is more powerful than B, if in addition, it can be programmed to recognize at least one additional language. Two machines are equivalent if they can be programmed to recognize precisely the same set of languages. Using this definition of power, we will classify several fundamental machines. Naturally, we are interested in designing the most powerful computer, i.e., the one that can solve the widest range of language recognition problems. Note that our notion of power does not say anything about how fast a computation can be done. Instead, it reflects a more fundamental notion of whether or not it is even possible to perform some computation in a finite number of steps.

Finite state automata. A deterministic finite state automaton (DFA) is, perhaps, the simplest type of machine that is still interesting to study. Many of its important properties carry over to more complicated machines. So, before we hope to understand these more complicated machines, we first study DFAs. However, it is an enormously useful practical abstraction because DFAs still retain sufficient flexibility to perform interesting tasks, yet the hardware requirements for building them are relatively minimal. DFAs are widely used in text editors for pattern matching, in compilers for lexical analysis, in web browsers for html parsing, and in operating systems for graphical user interfaces. They also serve as the control unit in many physical systems including: vending machines, elevators, automatic traffic signals, and computer microprocessors. Also network protocol stacks and old VCR clocks. They also play a key role in natural language processing and machine learning.

A DFA captures the basic elements of an abstract machine: it reads in a string, and depending on the input and the way the machine was designed, it outputs true or false. A DFA is always in one of N states, which we name 0 through N-1. Each state is labeled true or false. The DFA begins in a distinguished state called the start state. As the input characters are read in one at a time, the DFA changes from one state to another in a prespecified way. The new state is completely determined by the current state and the character just read in. When the input is exhausted, the DFA outputs true or false according to the label of the state it is currently in.

Below is an example of a DFA that accepts strings over the binary alphabet { a, b } that have a multiple of three b's. For example, the machine accepts bbaabbabb because it has exactly six b's.

Program DFA.java simulates a DFA that accepts inputs with a multiple of 3 b's.

String searching DFAs. One of the most important applications of DFAs is searching for patterns in strings. This is at the heart of Web search engines like Google. The following DFA gives a simplified example of such a tool over the binary alphabet. It accepts all string inputs that contain the pattern aabaaabb as a substring.

The DFA works by advancing one state to the right each time it reads in a matching character. For example, after reading in aabaa, the DFA is in state 5. If the next symbol read is a, it advances to state 6. If the next symbol read is b, it does not go all the way back to state 0 because the search pattern might overlap some of the input characters read in so far. Instead it goes to state 3 because the last 3 characters read in (aab) are the first 3 characters of the search pattern.

The famous Knuth-Morris-Pratt algorithm efficiently solves the problem of searching for a pattern in a string using DFAs. It first preprocesses the search pattern and creates a DFA like the one above. Then, to determine whether the pattern appears in a search string, it uses the search string as input to the DFA. It is also possible to support the more general types of search queries known as regular expressions using DFA technology. In fact, there is a striking duality between DFAs and regular expressions: regular expressions are precisely the types of patterns that DFAs can match.

DFA example. Build machine that recognizes valid email addresses of the form: string0@string1.string2.string3.edu. Assume each string is composed of only lowercase letters. Use nondeterminism to implement version where each address ends with edu, com, or gov. This Perl regular expression validates emails according to the grammar defined in the RFC822 standard. It's surprisingly complicated.

String processing. The program CommentStripper.java reads in a Java (or C++) program from standard input, removes all comments, and prints the result to standard output. This would be useful as part of a Java compiler. It removes /* */ and // style comments using a 5 state finite state automaton. It is meant to illustrate the power of DFAs, but to properly strip Java comments, you would need a few more states to handle extra cases, e.g., quoted string literals like s = "/***//*". The picture below is courtesy of David Eppstein.

Nondeterminism. Instructions like "go down the hallway and turn either left or right" are quite different from "go down the hallway and turn left if you see more than three people; otherwise turn right." The first instruction can be followed by turning left or right, but there is only one way to follow the second instruction. If you run a deterministic algorithm with the same input, it always produces the same output. In contrast, under identical conditions, a nondeterministic algorithm may make different "choices" each time the algorithm is executed. Most Java programs are deterministic. However, the method Math.random is intended to return a random sequence of real values each time you run it.

Philosophical implications. There is a philosophical connection between the theory of computation and the theory of the mind. In fact, finite state machines were first used by McCulloch and Pitts to model the behavior of neurons in the human brain.

Q.

A.

#### Exercises

1. Draw a 4-state DFA that accepts the set of all bitstrings ending with 11. (Each state represents whether the input string read in so far ends with 00, 01, 10, or 11.) Draw a 3-state DFA that accomplishes the same task.
2. Draw a DFA for bitstrings with at least one 0 and at least one 1.
3. Draw an NFA that matches all strings that contain either a multiple of 3 1's or a multiple of 5 1's. Hint: Use 3 + 5 + 1 = 9 states and one epsilon transition.
4. Draw an NFA that recognize the language of all strings that end in aaab.
5. Draw an NFA that recognize the language of all strings whose 4th to the last character is a.
6. Draw an NFA that recognize the language of all strings whose 5th to the last character is a.
7. Modify CommentStripper.java to:
1. Remove entirely blank lines.
2. Properly handle characters within quote strings.
8. Give a NFA with 5 accept states. Write an equivalent NFA that has exactly one accept state.
9. Give a NFA with ε-transitions. Write an equivalent NFA that has no ε-transitions.

#### Creative Exercises

1. Unary Divisibility. Given an DFA that accepts bitstrings with a multiple of three 0's and a multiple of five 1's.
2. Comment stripper. Write a program to remove /* */ style comments from a C program according to the C89 standard. Ignore // style comments. Comments can span multiple lines but are not nested. Also respect C string literals so that if /* appears within quotes, it is treated as part of the string instead of a comment. Write your program from scratch (no Java regular expressions). Hint: use a finite state automaton to simplify your logic.
3. Comment stripper. Write a program to remove /* */ and // style comments from a Java program. Use Java regular expressions. This site seems overly complicated since in Java you should be able to get .* to match end of lines??
4. Sleep on it. An experiment conducted by German neurologists and documented in Nature, volume 427 (2004), p. 352 hypothesizes that students who get more sleep are able to solve tricky problems better than students who are sleep deprived. The problem they used involves a string consisting of the three digits 1, 4, and 9. "Comparing" two digits that are the same yields the original digit; comparing two digits that are different yields the missing digit. For example f(1, 1) = 1, f(4, 4) = 4, f(1, 4) = 9, f(9, 1) = 4. Compare the first two digits of the input string, and then repeatedly compare the current result with the next digit in the string. Given a specific string, what number do you end up with? For example, if the input is string 11449494, you end up with 9.
 ```1 1 4 4 9 4 9 4 1 9 1 4 4 1 9 ```
5. DFA for permutations. Find the shortest DFA you can for the set of all permutations on N elements for N = 5 or 10.
6. Mealy and Moore machines. Mealy: DFA with output on each transition edge. Moore: DFA with output on each state.