4.1   Analysis of Algorithms


In this section, you will learn to respect a principle whenever you program: Pay attention to the cost. To study the cost of running them, we study our programs themselves via the scientific method. We also apply mathematical analysis to derive concise models of the cost.

Scientific method.

The following five-step method summarizes the scientific method: the following 5 step approach. The experiments we design must be reproducible and the hypotheses we formulate must be falsifiable.

Observations.

Measuring the exact running time of our program is difficult, but there are a number of tools available to help. In this book, we simply run a program on various inputs and measure the amount of time to process each input using the Stopwatch.java data type from Section 3.2. Our first qualitative observation about most programs is that there is a problem size that characterizes the difficulty of the computational task. Normally, the problem size is either the size of the input or the value of a command-line argument. Intuitively, the running time should increase with the problem size, but the question of how much it increases naturally arises every time we develop and run a program.

A concrete example.

To illustrate the approach, we start with ThreeSum.java which counts the number of triples in an array of n integers that sums to 0. What is the relationship between the problem size n and running time for ThreeSum?

Tilde notation.

We use tilde notation to develop simpler approximate expressions. First, we work with the leading term of mathematical expressions by using a mathematical device known as the tilde notation. We write ~ g(n) to represent any quantity that, when divided by f(n), approaches 1 as n grows. We also write g(n) ~ f(n) to indicate that g(n) / f(n) approaches 1 as n grows. With this notation, we can ignore complicated parts of an expression that represent small values. For example, the if statement in ThreeSum.count() is executed ~ n3 / 6 times because n(n−1)(n−2) / 6 = n3/6 − n2/2 + n/3, which, when divided by n3/6, approaches 1 as n grows.

We focus on the instructions that are executed most frequently, sometimes referred to as the inner loop of the program. In this program it is reasonable to assume that the time devoted to the instructions outside the inner loop is relatively insignificant.

Order of growth.

The key point in analyzing the running time of a program is this: for a great many programs, the running time satisfies the relationship T(n) ~ c f(n) where c is a constant and f(n) a function known as the order of growth of the running time. For typical programs, f(n) is a function such as log2 n, n, n log2 n, n2, or n3.

The order of growth of the running time of ThreeSum.count() is n3. The value of the constant c depends both on the cost of executing instructions and on details of the frequency analysis, but we normally do not need to work out the value. Knowing the order of growth typically leads immediately to a doubling hypothesis. In the case of ThreeSum.count(), knowing that the order of growth is n3 tells us to expect the running time to increase by a factor of 8 when we double the size of the problem because

$$\lim_{n\to\infty} \frac{T(2n)}{T(n)} \;=\; \frac{c (2n)^3}{c (n)^3} \;=\; 8$$

Order of growth classifications.

We use just a few structural primitives (statements, conditionals, loops, and method calls) to build Java programs, so very often the order of growth of our programs is one of just a few functions of the problem size, summarized in the table below.

order-of-growth classifications

Estimating memory usage.

To pay attention to the cost, you need to be aware of memory usage. Memory usage is well-defined for Java on your computer (every value will require precisely the same amount of memory each time that you run your program), but Java is implemented on a very wide range of computational devices, and memory consumption is implementation-dependent. See the textbook for full details.

Exercises

  1. Implement the method printAll() for ThreeSum.java, which prints all of the triples that sum to zero.
  2. Write a program FourSum.java that takes an integer reads long integers from standard input, and counts the number of 4-tuples that sum to zero. Use a quadruple nested loop. What is the order of growth of the running time of your program? Estimate the largest input size that your program can handle in an hour. Then, run your program to validate your hypothesis.
  3. What is the value of the variable count, as a function of n, after running the following code fragment?
    int count = 0;
    for (int i = 0; i < n; i++)
        for (int j = i+1; j < n; j++)
            for (int k = j+1; k < n; k++)
                count++;
    
    Solution: n choose 3 = n (n-1) (n-2) / 6.
  4. Determine the order of growth of the running time of this statement in ThreeSum as a function of the numberof integers n on standard input:
    int[] a = StdIn.readAllInts();
    
    Solution: Linear. The bottlenecks are the array initialization and the input loop. Depending on your system, however, the cost of an input loop like this might dominate in a linearithmic-time or even a quadratic-time program unless the input size is sufficiently large.
  5. Which would you prefer: a quadratic, linearithmic, or linear algorithm?

    Solution: While it is tempting to make a quick decision based on the order of growth, it is very easy to be misled by doing so. You need to have some idea of the problem size and of the relative value of the leading coefficients of the running time. For example, suppose that the running times are n^2 seconds, 100 n lg n seconds, and 10,000 n seconds. The quadratic algorithm will be fastest for n up to about 1,000, and the linear algorithm will never be faster than the linearithmic one (n would have to be greater than 2^100, far too large to bother considering).

  6. Apply the scientific method to develop and validate a hypothesis about order of growth of the running time of each of the following two code fragments as a function of n.
    String s = ""; 
    for (int i = 0; i < n; i++) {
        if (StdRandom.bernoulli(0.5)) s += "0"; 
        else                          s += "1"; 
    }
     
    StringBuilder sb = new StringBuilder(); 
    for (int i = 0; i < n; i++) {
        if (StdRandom.bernoulli(0.5)) sb.append("0"); 
        else                          sb.append("1"); 
    }
    String s = sb.toString(); 
    

    Solution: The first is quadratic; the second is linear.

  7. Give a linear-time algorithm for reversing a string.

    Solution:

    public static String reverse(String s) {
        int n = s.length();
        char[] a = new char[n];
        for (int i = 0; i < n; i++)
           a[i] = s.charAt(n-i-1);
        String reverse = new String(a);
        return reverse;
    }
    

Creative Exercises

  1. Subset sum. Write a program SubsetSum.java that reads long integers from standard input, and counts the number of subsets of those integers that sum to exactly zero. Give the order of growth of your algorithm.

  2. Sub-exponential function. Find a function whose order of growth is larger than any polynomial function, but smaller than any exponential function. Extra credit: Find a program whose running time has that order of growth.

    Solution: n^(ln n).

Web Exercises

  1. Suppose the running time of an algorithm on inputs of size 1,000, 2,000, 3,000, and 4,000 is 5 seconds, 20 seconds, 45 seconds, 80 seconds, and 125 seconds, respectively. Estimate how long it will take to solve a problem of size 5,000. Is the algorithm have linear, linearithmic, quadratic, cubic, or exponential?
  2. Write a program OneSum.java that takes a command-line argument N, reads in N integers from standard input, and finds the value that is closest to 0. How many instructions are executed in the data processing loop?
  3. Write a program TwoSum.java that takes a command-line argument N, reads in N integers from standard input, and finds the pair of values whose sum is closest to 0. How many instructions are executed in the data processing loop?
  4. Analyze the following code fragment mathematically and determine whether the running time is linear, quadratic, or cubic as a function of N.
    for (int i = 0; i < N; i++)
       for (int j = 0; j < N; j++)
          for (int k = 0; k < N; k++)
             c[i][j] += a[i][k] * b[k][j];
    
  5. The following function returns a random string of length N. How long does it take?
    public static String random(int N) {
        if (N == 0) return "";
        int r = (int) (26 * Math.random());  // between 0 and 25
        char c = 'a' + r;                    // between 'a' and 'z'
        return random(N/2) + c + random(N - N/2 - 1);
    }
    
  6. Sieve of Eratosthenes. Estimate running time of sieve of Eratoshthenes for finding all primes less than or equal to N as a function of N.

    Solution: in theory, the answer is proportional to N log log N. Follows from Mertens' theorem in number theory.