0. Prologue

This chapter under construction.

In this chapter we introduce a simple specialized machine called a linear feedback shift register (LFSR). We describe how it works, discover interesting applications, and study its limitations. Later, in the book, we will repeat the same process with a general-purpose computer.

One-time pad. As motivation for studying linear feedback shift registers, we will first consider an interesting application to cryptography. Cryptography is the science of creating secret codes, e.g., suppose Alice wants to send a secret message to Bob over the Internet. Alice's first step is to encrypt the message. Encrypting a message means transforming it into another message. She will send this encrypted message over the Internet. Of course, for the cryptographic scheme to be useful, Bob must be able to decrypt the encrypted message back into its original form. A one-time pad is a simple encryption method that is provably secure. That is, even if a malicious third party Eve is able to intercept the encrypted message, she will not be able to glean any information about the contents of the original message, other than its length.

Encryption. Now, we describe how Alice can use a one-time pad to send Bob a message.

First, we'll assume that Alice's message consists of N bits. A bit is one of two values 0 (false) or 1 (true). If Alice's message consists of text, then she must first pre-process it into a sequence of bits. A simple scheme for doing this is to use the Base64 encoding of the text: replace every occurrence of the letter 'A' with 000000, 'B' with 000001, and so forth according to the table below:

Binary Char Binary Char Binary Char Binary Char Binary Char Binary Char

000000 A 001011 L 010110 W 100001 h 101100 s 110111 3

000001 B 001100 M 010111 X 100010 i 101101 t 111000 4

000010 C 001101 N 011000 Y 100011 j 101110 u 111001 5

000011 D 001110 O 011001 Z 100100 k 101111 v 111010 6

000100 E 001111 P 011010 a 100101 l 110000 w 111011 7

000101 F 010000 Q 011011 b 100110 m 110001 x 111100 8

000110 G 010001 R 011100 c 100111 n 110010 y 111101 9

000111 H 010010 S 011101 d 101000 o 110011 z 111110 +

001000 I 010011 T 011110 e 101001 p 110100 0 111111 /

001001 J 010100 U 011111 f 101010 q 110101 1

001010 K 010101 V 100000 g 101011 r 110110 2

For example, to encode the 9-character message SENDMONEY, we would use the following sequence of N = 54 bits.

   S     E     N     D     M     O     N     E     Y        (message)
   010010000100001101000011001100001110001101000100011000   (message bits)

It is also possible to convert audio, image, or video data into a sequence of bits. Digital computers do exactly this to store MP3, JPEG, and DivX files on your computer.

Now, we assume Alice and Bob have agreed on a random sequence of N bits ahead of time, which they each know, but have kept it a secret from the rest of the world. These random bits are called the one-time pad. In the example above, Alice and Bob need to share 54 bits ahead of time, say
110010010011110110111001011010111001100010111111010010 (one-time pad)

Alice encrypts her message by lining it up with the one-time pad, combines pairs of bits using the exclusive or (XOR) function. The XOR function of two bits is 0 if the sum of the two bits is even and 1 if the sum is odd. Equivalently, the XOR function of two bits is 0 if the bits are the same, and 1 if they are different. The truth table below characterizes the behavior of the XOR function.

x	y	XOR
0	0	0
0	1	1
1	0	1
1	1	0

Applying the XOR function to Alice's message bits and the one-time pad yields:

   010010000100001101000011001100001110001101000100011000   (message bits)
^  110010010011110110111001011010111001100010111111010010   (one-time pad)
   ------------------------------------------------------
   100000010111111011111010010110110111101111111011001010   (encrypted bits)

We note that ^ is the Java symbol for exclusive or. We use the XOR function to prevent an eavesdropper from discovering the original message. If each bit of the one-time pad is chosen according to a fair and independent coin flip, then the corresponding bit in Alice's message also has an equal chance of winding up 0 or 1. To an eavesdropper, the encrypted bits are indistinguishable from a sequence of random coin flips.

Finally, Alice converts the encrypted bits back to text using Base64 encoding.

   100000010111111011111010010110110111101111111011001010   (encrypted bits)
   g     X     7     6     W     3     v     7     K        (encrypted message)

Alice encrypts the message "SENDMONEY" and sends it to Bob.

Decryption. Bob decrypts the encrypted message by applying the exact same procedure.

First, he converts the encrypted message to bits using the Base64 encoding.
Next, he takes the XOR of the encrypted message with the one-time pad.

Finally, he converts the bits to text using Base64 encoding.

   g     X     7     6     W     3     v     7     K        (encrypted message)
   100000010111111011111010010110110111101111111011001010   (encrypted bits)
^  110010010011110110111001011010111001100010111111010010   (one-time pad)
   ------------------------------------------------------
   010010000100001101000011001100001110001101000100011000   (message bits)
   S     E     N     D     M     O     N     E     Y        (message)

Implementation. To implement one-time pads, we must generate and distribute long sequences of random bits since the number of random bits required is equal to the number of bits in the message we wish to send. To ensure security, we must not reuse the bits - hence the name one-time pad.

A simple machine. A LFSR is a simple machine that captures many of the properties of real computers. One of its main application is generating pseudo-random bits. To generate a random bit, we could flip a fair coin and associate heads with 0 and tails with 1. This is horrendously tedious if we need to generate a million random bits. Instead, we would like to produce random bits automatically, using some kind of mechanical device. However, traditional computers are deterministic: they follow a proscribed set of rules, and given the same input, they always produce the same output. Nevertheless, it is possible to generate sequence of random bits that "look" like truly random bits. Such sequences are called pseudo-random. Robert R. Coveyou once remarked, "The generation of random numbers is too important to be left to chance." But according to the famous physicist John von Neumann, "anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin." Prepare yourself, as we are about to enter a state of sin.

A LFSR is comprised of four components. A cell is a storage element that holds one bit. A register is a sequence of cells. The behavior of the LFSR changes at discrete time intervals, when the clock ticks. A shift register is a register whose bits propagate one cell to the left when the clock ticks. Whatever bit is stored in cell i at time t is stored in cell i + 1 at time t + 1. To completely characterize the behavior of the machine, we must specify what happens at the boundary conditions (cell 0 and cell 10). We simply discard the bit stored in cell 10 since there is no room to store it in the next time step. The bit stored in cell 0 at time t + 1 is the exclusive or (XOR) of the bits stored in cells 8 and 10 at time t. Below are the contents of an 11-bit LFSR after the first eight time steps.

time |  X  9  8  7  6  5  4  3  2  1  0
---------------------------------------
  0  |  0  1  1  0  1  0  0  0  0  1  0
---------------------------------------
  1  |  1  1  0  1  0  0  0  0  1  0  1
  2  |  1  0  1  0  0  0  0  1  0  1  1
  3  |  0  1  0  0  0  0  1  0  1  1  0
  4  |  1  0  0  0  0  1  0  1  1  0  0
  5  |  0  0  0  0  1  0  1  1  0  0  1
  6  |  0  0  0  1  0  1  1  0  0  1  0
  7  |  0  0  1  0  1  1  0  0  1  0  0
  8  |  0  1  0  1  1  0  0  1  0  0  1
  9  |  1  0  1  1  0  0  1  0  0  1  0
 ...

The description above completely specifies the behavior of the machine, except for the starting input or seed. Any value other than all 0's will do. If we specify 001101000010 as the seed, then the first 8 steps of the machine evolves as above. It is interesting in consider the sequence of bits that appear in cell 0. The first 1000 such bits are listed below.

110010010011110110111001011010111001100010111111010010000100110100101
111001100100111111101110000010101100010000111010100110100001111001001
100111011111110101000001000010001010010101000110000010111100010010011
010110111100011010011011100111101011110010001001110101011101000001010
010001000110101010111000000010110000010011100010111011010010101100110
000111111100110000011111100011000011011110011101001111010011100100111
011101110101010101000000000010000000010100000010001000010101010010000
000110100000111001000110111010111010100010100001010001001000101011010
100001100001001111001011100111001011110111001001010111011000010101110
010000101110100100101001101100011110111011001010101111000000100110000
101111100100100011101101011010110001100011101111011010100101100001100
111001111110111100001010011001000111111010110000100011100101011011100
001101011001110001111101101100010110111010011010100111100001110011001
101111111110100000001001000001011010001001100101011111100001000011001
0100111110001110001101101101110110....

Can you see any pattern? These bits are not truly random, since they arise from a precise set of rules. Once the machine is started with a given seed, the same sequence of bits is produced.

Analysis. Now that we have a machine (LFSR) that generates pseudo-random bits, we are interested in analyzing properties of the machine and the bits they produces.

Repeating sequences. A natural question is to ask whether or not the bits repeat. For example, the sequence 1110111101111011110111101 repeats after every 5 bits. After very careful and tedious inspection, you might notice that the pattern of bits produced by the machine above repeats every 2047 steps, but not before. In fact, you would get a cycle length of 2047 regardless of the initial seed (as long as it is nonzero). This is essentially the best you can hope for with any 11-bit machine. The state of the LFSR is characterized by 11 bits, so the machine only has 2¹¹ = 2048 possible states. Once it revisits a state, it gets caught in a loop, and will repeat the same sequence of values forever. The LFSR can only go 2048 steps without revisiting a state.
Randomness. A sequence which takes a large number of steps to repeat is clearly necessary if we want a pseudo-random number generator, but it is not sufficient. For example, counting from 0 to 2047 in binary has the same property, but is far from anything we would consider to be random....
Taps. The cells that are XOR'ed together are called taps. What happens if we choose a different set of taps instead of 8 and 10? It turns out that the choice of taps is very important. The taps 1 and 10 would also produce a cycle of length 2047. However, if we chose 4 and 10 as the taps, the cycle length would decrease to 595 (for the same starting seed value). Understanding why this is the case requires mathematics beyond the scope of this book (abstract algebra and finite fields).
Scalability. The 11-bit LFSR produces 2047 pseudo-random bits before repeating. What if we need more? The LFSR naturally extends to store more bits. With a careful choice of tap positions, it is possible to get pseudo-random bits with a cycle length of 1 million with 20 cells (taps 2 and 19), or 1 billion with 30 cells (taps 0, 3, 5, and 29). Here are some taps of maximal length taps for N-bit linear feedback shift registers.
Security. Berlekamp and Massey discovered an amazing algorithm that can reconstruct an N-bit LFSR after seeing only 2N bits. That is, after seeing the first 2N bits from any LFSR, the Berlekamp-Massey algorithm can correctly predict every future bit. Thus, we should not use a LFSR to generate random bits for a one time pad in a mission critical application because a knowledgeable adversary can break the system. In Section 9.9 we will consider cryptographic methods that cannot be so easily circumvented.

One of the major drawbacks of one-time pads is key distribution. The two parties must exchanges private keys prior to their communication. Until somewhat recently, the White House and Kremlin would communicate this way. They employed trusted couriers who would travel with the one-time pads in a briefcase, handcuffed to themselves. Another problem is that if Alice sends Bob a message, there is no way that Bob can prove to a third party that Alice really sent it. In Section 10.x we will consider the very widely used RSA cryptosystem, which overcomes these obstacles and more.

Other applications. Most modern military cryptosystems use LFSR in some form. Cray supercomputers even have a built-in instruction called population count that directly supports the simulation of LFSRs, and this support is required in some government contracts! (Reference: Applied Crypto, p. 380)

DVD encryption. Another application of LFSRs is to encrypt DVDs with CSS (content scrambling system). CSS was designed to prevent copying DVD content and to prevent playing DVDs designed for one region in the world from being played in another region. CSS relies on two LFSRs (one of 17 bits, the other of 25 bits) and a 40-bit key to encrypt the data on a DVD. To play back the DVD, you need licensed hardware or software that is capable of decrypting the data. Since the CSS license is incompatible with open source software, Linux users could not play DVDs on their own computers. On October 6, 1999, in response to this dire situation, Norwegian teenager Jon Johansen released a program called DeCSS which decrypts CSS-encrypted DVDs and allows them to be played back (or copied). The security of the CSS encryption scheme relied on the bits generated from the LFSR to be random. However, they are not random, and this enabled programmers to completely reverse engineer the scheme and break the CSS system.

Context. The LFSR shares many important properties with everyday computers. Both are build from simple components; both scale to solve huge problems; both require a deep understanding to use effectively.

Basic Component	LFSR	Computer
Control	Start, stop, load	Start, stop, load
Clock	Regular pulse	2.8 GHz pulse
Memory	11 bits	512 MB
Input	Seed	Sequence of bits
Computation	XOR, shift	Logic, arithmetic, ...
Output	Pseudo-random bits	Sequence of bits

The critical difference between a LFSR and a computer is that a computer can be programmed to simulate any abstract machine.

Simulating an LFSR in Java. It is possible to build an LFSR out of physical components. Another way to study its behavior is by writing a Java program. You will learn about writing Java programs soon, but to give you an idea, the Java program LFSR.java prints out the first N bits of the LFSR described above, where N is a parameter specified by the user. Even without any programming experience, you may be able to glean information from the code. In Java a boolean variables stores one of two values, true or false. The ^ operator means exclusive or. You will understand this program fully by the end of Section 2.3.

Q + A

Q. Can I resue a one-time pad?

A. No. If you reuse it, it's easy to break. The Rosenbergs went to the electric chair because some Russian spy reused a "one-time" pad.

Q. How can I get truly random bits?

A. Harness some physical process. Flip a fair coin. Careful, even this is likely biased! Scoop up a pile of sand - return true if there are an odd number of grains, false if there are an even number. Use radioactive decay, quantum entanglement, thermal noise, or quasar radiation (only need to share which quasar and exactly when to monitor).

Exercises

Suppose that you start the LFSR above with seed equal to 0000000000. What sequence of bits is produced?
Suppose that you start the LFSR above with seed equal to 11000000011. What are the first 5 bits produced by the LFSR.
Give a 3 bit LFSR with taps at 0 and 2, and initial seed 001, determine the number of steps before the pattern repeats.
Repeat the previous exercise with taps 1 and 2 and initial seed 001. What if the initial seed is xxx instead, where xxx is hopefully chosen so that it produces a different period.
Windows CE allows the user to cache their password when connecting to another system running Windows so that they don't have to re-enter it each time they login. However, they "encrypt" the password itself using the following scheme: take the letters of the password (7 characters at a time) and bitwise XOR the ASCII representations of them with the word "susageP". Pegasus was the project name for Windows CE and is susageP spelled backwards. Decrypt the following password:

Creative Exercises

Obfuscating source code. Suppose that you want to distribute a program, but there is a hardwired string (perhaps a customer identification or registration number) that you'd like to hide from a casual observer. Describe a method that would make it harder to discover the secret string. Answer: generate a random bit string and xor it with the string you'd like to hide. Store both the random bit string and the xor'd string. Both will look like gibberish. To recover the id number, have the program xor them together.
Cyclic redundancy check. A cyclic redundancy check is used to: xyz. The Ethernet CRC (or CRC32) is used in freeware packages such as XModem or Kermit. It is computed using a linear feedback shift register as follows: xyz. Compute by hand the CRC16 or CRC32 of the following string: xyz. In Section 2.4 you will write a Java program to compute the CRC32 of a file.
Rock-paper-scissors. Suppose that two people want to simulate a fair coin flip, but don't have a coin. A classic solution is for each participant to simultaneously reveal one of three symbols: rock, paper, or scissors. Rock beats scissors, scissors beats paper, and paper beats rock. The process is repeated in the event of a tie. Devise a scheme where each participant expresses one of two gestures, and there are no ties. Answer: each person reveals 0 or 1. The first persons wins if the xor of the two bits is 0; the second person wins if the xor is 1.
Drawing straws. One of five siblings must shovel the snow. A classic solution to pick one at random is to have one person hide four long straws and one short straw. Whomever picks the short straw must do the chore. Devise a scheme that works if no straws are available. Answer: each person 0 through 4 picks a number between 0 and 4. The sum of the numbers mod 5 is the unlucky sibling.
Secret sharing. Suppose you have a secret message m and want to distribute it to Alice and Bob so that neither one of them can read the message on their own, but if Alice and Bob both work together they can read the message. Consider the following scheme: generate two random bit strings a and b that have as many bits as the message m. Compute x = m xor a xor b. Send a and x to Alice, and send b and x to Bob. Show how Alice and Bob together can recover the original message m. Argue that neither Alice nor Bob can recover the original message on their own.
Class average. There are N students in an introductory computer science class who just took an exam (out of 100 points). Suppose that the score of student i is s[i]. The students want to cooperate to learn the class average, but no student wants to reveal their score. Consider the following protocol: the class picks a number bigger than 100N, say M. Then each student i picks N-1 integers x₁, ..., x_N-1 at random between 0 and M-1, and then computes the unique integer x_N so that the sum of the N integers is equal to s[i], modulo M. Now, each student distributes N-1 of these integers, one each to the other students, and keeps one for themselves. Each student now adds up the N-1 integers they received plus the integer they kept, and announces the sum. Finally, all of these announced numbers are added together. The average is this sum divided by N. Try the protocol out with 3 students. Show that the protocol is correct: if all students participate honestly, the resulting number is the class average. Show that the protocol is secure: any set of k dishonest students who conspire can learn nothing more than the sum of the scores of the remaining N - k students.
Biased coin. Given a biased coin that is heads with probability p and tail with probability (1-p), how can we simulate a fair coin flip? Answer (John von Neumann, 1951): Flip the coin twice. If the result is heads-tails then output heads; if the result is tails-heads then output tails. Otherwise repeat.
Fair coin. The optimal strategy in rock-paper-scissors is to choose one of the three items at random. Given a fair coin, show how to simulate such a random choice. Answer: flip the coin twice. Associate HH with rock, HT with scissors, TH with paper. If you get HH repeat.
Fair coin. Given a fair coin, show how to simulate a K-sided coin (or die). Answer: flip the coin k times where k is the number of bits in the binary representation of K. Treat the sequence of coin flips as a k-bit binary number. If the number is between 0 and K-1, then use this as the outcome of the K-sided coin. Otherwise, repeat.
Dining cryptographers. Alice, Bob, and Carol are dining at an expensive restaurant. When it comes time to pay the check, the waiter announces that the meal is being paid anonymously. Alice, Bob, and Carol want to verify that the meal is being paid by one of them, and not by an outsider (as this might violate some rules). The diners wish to determine if the meal is being paid by an outsider, without divulging the identity of the payer if it is being paid by one of them. Alice devises the following scheme: set up three menus, between each of the three diners. Each diner flips a fair coin and reveals it only to the diner to his or her right. Each diner then says "match" if the two coin flips he or she sees are the same, and "no match" otherwise. If one of the diners is the payer, he or she says the opposite of what he or she sees. Argue that if the number of "no matches" is odd, then an outsider is paying. Argue that if the number of "no matches" is even, then one of the diners is paying (but it is impossible for the other two diners to determine which one). Reference: The Dining Cryptographers Problem.
Visual cryptography. Consider Shamir-Naor scheme.

Binary	Char	Binary	Char	Binary	Char	Binary	Char	Binary	Char	Binary	Char
000000	A	001011	L	010110	W	100001	h	101100	s	110111	3
000001	B	001100	M	010111	X	100010	i	101101	t	111000	4
000010	C	001101	N	011000	Y	100011	j	101110	u	111001	5
000011	D	001110	O	011001	Z	100100	k	101111	v	111010	6
000100	E	001111	P	011010	a	100101	l	110000	w	111011	7
000101	F	010000	Q	011011	b	100110	m	110001	x	111100	8
000110	G	010001	R	011100	c	100111	n	110010	y	111101	9
000111	H	010010	S	011101	d	101000	o	110011	z	111110	+
001000	I	010011	T	011110	e	101001	p	110100	0	111111	/
001001	J	010100	U	011111	f	101010	q	110101	1
001010	K	010101	V	100000	g	101011	r	110110	2