9.6 Optimization
This section under major construction.
Dynamic programming.
Dynamic programming is a powerful algorithmic design paradigm. The key idea is to save state to avoid recomputation: break a large computational problem up into smaller subproblems, store the answers to those smaller subproblems, and, eventually, use the stored answers to solve the original problem. This avoids recomputing the same quantity over and over again, and a potential exponential blowup in the running time. We now consider two examples. Binomial coefficients.
The binomial
coefficient C(n, k)
is the number of ways of choosing a subset of k elements from
a set of n elements. It arises in probability and statistics,
e.g., the probability of flipping a biased coin (with probability p
of heads) n times and getting exactly k heads is
C(n, k) p^{k} (1p)^{nk}.
One formula for computing binomial coefficients is
C(n, k) = n! / (k! (nk)!).
This formula is not so amenable to direct computation because
the intermediate results may overflow, even if the final answer does not.
For example C(100, 15) = 253338471349988640 fits in a 64bit long,
but the binary representation of 100! is 525 bits long.
Pascal's identity expresses C(n, k) in terms of
smaller binomial coefficients:
To understand why Pascal's identity is valid, consider some arbitrary element x. To choose k of n elements, we either select element x (in which case we still need to choose k1 of the remaining n1 elements), or we don't select element x (in which case we still need to choose k of the remaining n1 elements).
The naive implementation below fails spectacularly for medium n or k, not because of overflow, but rather because the same subproblems are solved repeatedly.
// DO NOT RUN THIS CODE FOR LARGE INPUTS public static long binomial(int n, int k) { if (k == 0) return 1; if (n == 0) return 0; return binomial(n1, k) + binomial(n1, k1); }
Instead of using a recursive procedure, we could fill up the entries in the nbyk array. We must organize the computation so that all entries that we need are filled in prior to using them in subsequent computations. Bottomup dynamic programming fills in the 2D array starting with the values that are easiest to compute. We begin with the base cases: C(n, 0) = 1 for all n ≥ 0, and C(0, k) = 0 for all k ≥ 1. Then we fill in all the values when n = 1, then n = 2, and so forth. This ensures that everything we need is precomputed before we ever access it. Program Binomial.java takes two command line integer N and K and prints out C(N, K) using a combination of Pascal's identity and bottomup dynamic programming.
for (int k = 1; k <= K; k++) binomial[0][k] = 0; for (int n = 0; n <= N; n++) binomial[n][0] = 1; for (int n = 1; n <= N; n++) for (int k = 1; k <= K; k++) binomial[n][k] = binomial[n1][k1] + binomial[n1][k];
n\k 0 1 2 3 4  0 1 0 0 0 0 1 1 1 0 0 0 2 1 2 1 0 0 3 1 3 3 1 0 4 1 4 6 4 1 5 1 5 10 10 5 6 1 6 15 20 15
 Longest common subsequence.
Now we consider a more sophisticated application of dynamic programming
to a central problem arising in computational biology and other domains.
Given two strings s and t, we wish to determine how similar they are.
Some examples include: comparing two DNA sequences for homology (similarity),
two English words for spelling, two Java files for repeated code.
It also arises in molecular biology, gas chromatography, and bird
song analysis.
One simple strategy is
to find the length of the longest common subsequence (LCS).
If we delete some characters from s and some characters from t,
and the resulting two strings are equal, we call the resulting
string a common subsequence. The LCS problem is to find
a common subsequence that is as long as possible. For example
the LCS of ggcaccacg and acggcggatacg is
ggcaacg.
ggcaccacg acggcggatacg
Now we describe a systematic method for computing the LCS of two strings x and y using dynamic programming. Let M and N be the lengths of x and y, respectively. We use the notation x[i..M] to denote the suffix of x starting at position i, and y[j..N] to denote the suffix of y starting at position j. If x and y begin with the same letter, then we should include that first letter in the LCS. Now our problem reduces to finding the LCS of the two remaining substrings x[1..M] and y[1..N]. On the other hand, if the two strings start with different letters, both characters cannot be part of a common subsequence, so we must remove one or the other. In either case, the problem reduces to finding the LCS of two strings, at least one of which is strictly shorter. If we let opt[i][j] denote the length of the LCS of x[i..M] and y[j..N], then the following recurrence expresses it in terms of the length of LCSs of shorter suffixes.
opt[i][j] = 0 if i = M or j = N = opt[i+1][j+1] + 1 if x_{i} = y_{j} = max(opt[i][j+1], opt[i+1][j]) otherwise
Program LCS.java is a bottomup translation of this recurrence. We maintain a two dimensional array opt[i][j] that is the length of the LCS for the two strings x[i..M] and y[j..N]. For the input strings ggcaccacg and acggcggatacg, the program computes the following table by filling in values from righttoleft (j = N1 to 0) and bottomtotop (i = M1 to 0).
0 1 2 3 4 5 6 7 8 9 10 11 12 x\y a c g g c g g a t a c g  0 g 7 7 7 6 6 6 5 4 3 3 2 1 0 1 g 6 6 6 6 5 5 5 4 3 3 2 1 0 2 c 6 5 5 5 5 4 4 4 3 3 2 1 0 3 a 6 5 4 4 4 4 4 4 3 3 2 1 0 4 c 5 5 4 4 4 3 3 3 3 3 2 1 0 5 c 4 4 4 4 4 3 3 3 3 3 2 1 0 6 a 3 3 3 3 3 3 3 3 3 3 2 1 0 7 c 2 2 2 2 2 2 2 2 2 2 2 1 0 8 g 1 1 1 1 1 1 1 1 1 1 1 1 0 9 0 0 0 0 0 0 0 0 0 0 0 0 0
 x[i] matches y[j]. In this case, we must have opt[i][j] = opt[i+1][j+1] + 1, and the next character in the LCS is x[i].
 The LCS does not contain x[i]. In this case, opt[i][j] = opt[i+1][j].
 The LCS does not contain y[j]. In this case, opt[i][j] = opt[i][j+1].
The algorithm takes time and space proportional to MN.
Space usage. Usually with dynamic programming you run out of space before time. However, sometimes it is possible to avoid using an MbyN array and getting by with just one or two arrays of length M and N. For example, it is not hard to modify Binomial.java to do exactly this. (See Exercise 1.) Similarly, for the longest common subsequence problem, it is easy to avoid the 2D array if you only need the length of LCS. Finding the alignment itself in linear space is substantially more challenging (but possible using Hirschberg's divideandconquer algorithm).
Dynamic programming history. Bellman. LCS by Robinson in 1938. CockeYoungerKasami (CYK) algorithm for parsing context free grammars, FloydWarshall for allpairs shortest path, BellmanFord for arbitrage detection (negative cost cycles), longest common subsequence for diff, edit distance for global sequence alignment, bitonic TSP. Knapsack problem, subset sum, partitioning. Application = multiprocessor scheduling, minimizing VLSI circuit size.
Root finding. Goal: given function f(x), find x* such that f(x*) = 0. Nonlinear equations can have any number of solutions.
x^{2} + y^{2} = 1 no real solutions e^{x} + 17 = 0 one real solution x^{2} 4x + 3 = 0 has two solutions (1, 3) sin(x) = 0 has infinitely many solutions
Unconstrained optimization. Goal: given function f(x), find x* such that f(x) is maximized or minimized. If f(x) is differentiable, then we are looking for an x* such that f'(x*) = 0. However, this may lead to local minima, maxima, or saddle points.
Bisection method. Goal: given function f(x), find x* such that f(x*) = 0. Assume you know interval [a, b] such that f(a) < 0 and f(b) > 0.
Newton's method. Quadratic approximation. Fast convergence if close enough to answer. The update formulas below are for finding the root of f(x) and f'(x).
root finding: x_{k+1} = x_{k}  f'(x_{k})^{1} f(x_{k}) optimization: x_{k+1} = x_{k}  f''(x_{k})^{1} f'(x_{k})
Newton's method only reliable if started "close enough" to solution. Bad example (Smale): f(x) = x^3  2*x + 2. If you start in the interval [0.1, 0.1] , Newton's method reaches a stable 2cycle. If started to the left of the negative real root, it will converge.
To handle general differentiable or twice differentiable functions of one variable, we might declare an interface
public interface Function { public double eval(double x); public double deriv(double x); }
Program Newton.java runs Newton's method on a differentiable function to compute points x* where f(x*) = 0 and f'(x*) = 0.
The probability of finding an electron in the 4s excited state of hydrogen ar radius r is given by: f(x) = (1  3x/4 + x^{2}/8  x^{3}/192)^{2} e^{x/2}, where x is the radius in units of the Bohr radius (0.529173E8 cm). Program BohrRadius.java contains the formula for f(x), f'(x), and f''(x). By starting Newton's method at 0, 4, 5, and 13, and 22, we obtain all three roots and all five local minima and maxima.
Newton's method in higher dimensions. [probably omit or leave as an exercise] Use to solve system of nonlinear equations. In general, there are no good methods for solving a nonlinear system of equations
x_{k+1} = x_{k}  J(x_{k})^{1} f(x_{k})
where J is the Jacobian matrix of partial derivatives. In practice, we don't explicitly compute the inverse. Instead of computing y = J^{1}f, we solve the linear system of equations Jy = f.
To illustrate the method, suppose we want to find a solution (x, y) to the following system of two nonlinear equations.
x^{3}  3xy^{2}  1 = 0 3x^{2}y  y^{3} = 0
In this example, the Jacobian is given by
J = [ 3x^{2}  3y^{2} 6xy ] [ 6x 3x^{2}  3y^{2} ]
If we start Newton's method at the point (0.6, 0.6), we quickly obtain one of the roots (1/2, sqrt(3)/2) up to machine accuracy. The other roots are (1/2, sqrt(3)/2) and (1, 0). Program TestEquations.java uses the interface Equations.java and EquationSolver.java to solve the system of equations. We use the Jama matrix library to do the matrix computations.
Optimization. Use same method to optimize a function of several variables. Good methods exist if multivariate function is sufficiently smooth.
x_{k+1} = x_{k}  H(x_{k})^{1} g(x_{k})
Need gradient g(x) = ∇f(x) and Hessian H(x) = ∇^{2}f(x). Method finds an x* where g(x*) = 0, but this could be a maxima, minima, or saddle point. If Hessian is positive definite (all eigenvalues are positive) then it is a minima; if all eigenvalues are negative, then it's a maxima; otherwise it's a saddle point.
Also, 2nd derivatives change slowly, so it may not be necessary to recalculate the Hessian (or its LU decomposition) at each step. In practice, it is expensive to compute the Hessian exactly, so other so called quasiNewton methods are preferred, including the BroydenFletcherGoldfarbShanno (BFGS) update rule.
Linear programming. Create matrix interface. Generalizes twoperson zerosum games, many problems in combinatorial optimization, .... run AMPL from the web.
Programming = planning. Give some history. Decision problem not known to be in P for along time. In 1979, Khachian resolved the question in the affirmative and made headlines in the New York Times with a geometric divideandconquer algorithm known as the ellipsoid algorithm. It requires O(N^{4}L) bit operations where N is the number of variables and L is the number of bits in the input. Although this was a landmark in optimization, it did not immediately lead to a practical algorithm. In 1984, Karmarkar proposed a projective scaling algorithm that takes O(N^{3.5}L) time. It opened up the door for efficient implementations because by typically performing much better than its worst case guarantee. Various interior point methods were proposed in the 1990s, and the best known complexity bound is O(N^{3} L). More importantly, these algorithm are practical and competitive with the simplex method. They also extend to handle even more general problems.
Simplex method.
Linear programming solvers. In 1947, George Dantzig proposed the simplex algorithm for linear programming. One of greatest and most successful algorithms of all time. Linear programming, but not industrial strength. Program LPDemo.java illustrates how to use it. The classes MPSReader and MPSWriter can parse input files and write output files in the standard MPS format. Test LP data files in MPS format.
More applications. ORObjects also has graph coloring, traveling salesman problem, vehicle routing, shortest path.
Exercises
 Binomial coefficients. Compute the binomial coefficient using dynamic programming, but only two 1D arrays of length K, instead of one 2D array of size NbyK.
 Running time recurrences. Use dynamic programming to compute a table of values T(N), where T(N) is the solution to the following divideandconquer recurrence. T(1) = 0, T(N) = N + T(N/2) + T(N  N/2) if N > 1.
 Fibonacci numbers. Write a program Fibonacci.java that takes a commandline argument N and computes the Nth Fibonacci number using dynamic programming. Compare the running time of your program against the naive recursive version from Section 2.3.
 Conway's sequence. Consider the following recursive function. f(n) = f(f(n1)) + f(nf(n1)) for n > 2 and f(1) = f(2) = 1. Compute f(3). Write a Java program to compute the first 50 values of f(n) in the sequence. Use dynamic programming. Conway's sequence has many interesting properties and connects with Pascal's triangle, the Gaussian distribution, Fibonacci numbers, and Catalan numbers.
 Gas station optimization. You are driving from Princeton to San Francisco in a car that gets 25 miles per gallon and has a gas tank capacity of 15 gallons. Along the way, there are N gas stations where you can stop for gas. Gas station i is d[i] miles into the trip and sells gas for p[i] dollars per gallon. If you stop at station i for gas, you must completely fill up your tank. Assume that you start with a full tank and that the d[i] are integers. Use dynamic programming to find a minimum cost sequence of stops.
 Use Newton's method to find an x (in radians) such that x = cos(x).
 Use Newton's method to find an x (in radians) such that x^{2} = 4 sin(x).
 Use Newton's method to find an x (in radians) that minimizes f(x) = sin(x) + x  exp(x).
 Use Newton's method to find (x, y) that solve
x + y  xy = 2 x exp(y) = 1
Start at the point (0.1, 0.2), which is near the true root (0.09777, 2.325).
 Use Newton's method to find (x, y) that solve
x + 2y = 2 x^{2} + 4y^{2} = 4
Start at the point (1, 2), which is near the true root (0, 1).
 Use Newton's method to find (x, y) that solve
x + y = 3 x^{2} + y^{2} = 9
Start at the point (2, 7).
 Use Newton's method to minimize f(x) = 1/2  x e^{x2}. Hint: f'(x) = (2x^{2}1)e^{x2}, f''(x) = 2x(32x^{2})e^{x2}.

Use Newton's method to find all minima, maxima, and saddle points of
the following function of two variables.
f(x,y) = 3(1x)^{2} exp(x^{2}(y+1)^{2})  10((y/6)y^{3}) exp(x^{2}y^{2})
Creative Exercises
 Bernoulli numbers. Bernoulli numbers appear in the Taylor expansion of the tangent function, the EulerMacLaurin summation formula, and the Riemann zeta function. They can be defined recursively by B_{0} = 1, and using the fact that the sum from j = 0 to N of binomial(N+1, j) B_{j} = 0. For example B_{1} = 1/2, B_{2} = 1/6, B_{3} = 0, and B_{12} = 691/2730. Bernoulli computed the first 10 Bernoulli numbers by hand; Euler's compute the first 30. In 1842, Ada Lovelace suggested to Charles Babbage that he devise an algorithm for computing Bernoulli numbers using his Analytic Engine. Write a program Bernoulli.java that takes a commandline argument N and prints out the first N Bernoulli numbers. Use the BigRational.java data type from Section 9.2.
 Unix diff. The Unix diff program compares two files linebyline and prints out places where they differ. Write a program Diff.java that reads in two files specified at the command line one line at a time, computes the LCS on the sequence of constituent lines of each file, and prints out any lines corresponding to nonmatches in the LCS.
 Longest common subsequence of 3 strings. Given 3 strings, find the longest common subsequence using dynamic programming. What is the running time and memory usage of your algorithm?
 Making change. Given A hundred dollar bills, B fifty dollar bills, C twenty dollar bills, D ten dollar bills, E five dollar bills, F one dollar bills, G halfdollars, H quarters, I dimes, J nickels, and K pennies, determine whether it is possible to make change for N cents. Hint: knapsack problem. (Greedy also works.)
 Making change. Suppose that you are a cashier in a strange country where the currency denominations are: 1, 3, 8, 16, 22, 57, 103, and 526 cents (or more generally d_{0}, d_{1}, ..., d_{N1}. Describe a dynamic programming algorithm to make change for c cents using the fewest number of coins. Hint: the greedy algorithm won't work since the best way to change 114 cents is 57 + 57 instead of 103 + 8 + 3.
 Longest increasing sequence.
Given an array of N 64bit integers, find the longest
subsequence that is strictly increasing.
Hint. Compute the longest common subsequence between the original array and a sorted version of the array where duplicate copies of an integer are removed.
 Longest commmon increasing sequence. Computational biology. Given two sequences of N 64bit integers, find the longest increasing subsequence that is common to both sequences.
 Activity selection with profits. Job i has start time s_i, finish time f_i and profit p_i. Find best subset of jobs to schedule.
 Diff. Write a program that reads in two files and prints out their diff. Treat each line as a symbol and compute an LCS. Print out those lines in each file that aren't in the LCS.
 Knapsack problem. Knapsack.java.
 Text justification. Write a program that takes a command line argument N, reads text from standard input, and prints out the text, formatted nicely with at most N characters per line. Use dynamic programming.
 Viterbi algorithm. Given a directed graph where each edge is labeled with a symbol from a finite alphabet. Is there a path from one distinguished vertex x that matches the characters in the string s? Dynamic programming. A(i, v) = 0 or 1 if there is a path from x to v that consumes the first i characters of s. A(i, v) = max (A(i1, u) : (u, v) in E labeled with s[i]).
 Viterbi algorithm. Speech recognition, handwriting analysis, computational biology, hidden Markov models. Suppose each edge leaving v has a probability p(v, w) of being traversed. Probability of a path is the product of the probability on that path. What is most probable path? Dynamic programming.
 Smith Waterman algorithm. Local sequence alignment.
 Eccentricity anomaly. (Cleve Moler) The eccentricity anomaly arises in Kepler's model of planetary motion and satisfies M = E  e sin E, where M is the mean anomaly (24.851090) and e is the orbit eccentricity (0.1). Solve for E.
 Newton's method with complex numbers. Implement Newton's method for finding a complex root of an equation. Use Complex.java and implement Newton's method exactly as when finding real roots, but use complex numbers.