2.4   A Case Study: Percolation


We conclude our study of functions and modules by considering a case study of developing a program to solve an interesting scientific problem. as Monte Carlo simulation to study a natural model known as percolation.

Percolation.

We model the system as an N-by-N grid of sites. Each site is either blocked or open; open sites are initially empty. A full site is an open site that can be connected to an open site in the top row via a chain of neighboring (left, right, up, down) open sites. If there is a full site in the bottom row, then we say that the system percolates. If sites are independently set to be open with vacancy probability p, what is the probability that the system percolates? No mathematical solution to this problem has yet been derived. Our task is to write computer programs to help study the problem.
percolates           does not percolate

Scaffolding.

Our first step is to pick a representation of the data. We use one boolean matrix to represent which sites are open and another boolean matrix to represent which sites are full. We will design a flow() method that takes as an argument a two-dimensional boolean array open[][] that specifies which sites are open and returns another two-dimensional boolean array full[][] that specifies which sites are full. We will also include a percolates() method that checks whether the array returned by flow() has any full sites on the bottom.

Vertical percolation.

Given a boolean matrix that represents the open sites, how do we figure out whether it represents a system that percolates? For the moment, we will consider a much simpler version of the problem that we call vertical percolation. The simplification is to restrict attention to vertical connection paths.
vertically percolates           does not vertically percolate
Determining the sites that are filled by some path that is connected vertically to the top is a simple calculation.
vertical percolation
Program VerticalPercolation.java

Data visualization.

Program Visualize.java is a test client that generates random boolean matrices and plots them using standard draw.
percolation visualization

Estimating probabilities.

The next step in our program development process is to write code to estimate the probability that a random system (of size N with site vacancy probability p) percolates. We refer to this quantity as the percolation probability. To estimate its value, we simply run a number of experiments. Program Estimate.java encapsulates this computation in a method eval().

Recursive solution for percolation.

How do we test whether a system percolates in the general case when any path starting at the top and ending at the bottom (not just a vertical one) will do the job? Remarkably, we can solve this problem with a compact program, based on a classic recursive scheme known as depth-first search. Program Percolation.java contains an implementation of flow() that computes the flow array. See the textbook for details.

Adaptive plot.

To gain more insight into percolation, the next step in program development is to write a program that plots the percolation probability as a function of the site vacancy probability p for a given value of N. Immediately, we are faced with numerous decisions. For how many values of p should we compute an estimate of the percolation probability? Which values of p should we choose? Program PercPlot.java implements a recursive appraoch that produces a good-looking curve at relatively low cost. See the textbook for details.
adaptive plot for 20-by-20 percolation system           adaptive plot for 100-by-100 percolation system
The curves support the hypothesis that there is a threshold value (about .593): if p is greater than the threshold, then the system almost certainly percolates; if p is less than the threshold, then the system almost certainly does not percolate. As N increases, the curve approaches a step function that changes value from 0 to 1 at the threshold. This phenomenon, known as a phase transition, is found in many physical systems.

Lessons.

  • Expect bugs. Every interesting piece of code that you write is going to have at least one or two bugs, if not many more. By running small pieces of code on small test cases that you understand, you can more easily isolate any bugs and then more easily fix them when you find them.

  • Keep modules small. You can focus attention on at most a few dozen lines of code at a time, so you may as well break your code into small modules as you write it.

  • Limit interactions. In a well-designed modular program, most modules should depend on just a few others. In particular, a module that calls a large number of other modules needs to be divided into smaller pieces. Modules thatare called by a large number of other modules (you should have only a few) need special attention, because if you do need to make changes in a module's API, you have to reflect those changes in all its clients.

    dependency graph for percolation

  • Develop code incrementally. You should run and debug each small module as you implement it. That way, you are never working with more than a few dozen lines of unreliable code at any given time.

  • Solve an easier problem. It is typical to begin by putting together the simplest code that you can craft that solves a given problem, as we did with vertical-only percolation.

  • Consider a recursive solution. Recursion is an indispensable tool in modern programming that you should learn to trust. If you are not already convinced of this fact by the simplicity and elegance of the flow() method in Percolation.java, you might wish to try to develop a nonrecursive version.

  • Build tools when appropriate. Our visualization method show() and random boolean matrix generation method random() are certainly useful for many other applications, as is the adaptive plotting method of PercPlot.java. Incorporating these methods into appropriate libraries would be simple.

  • Reuse software when possible. Our StdIn, StdRandom, and StdDraw libraries all simplified the process of developing the code in this section.

    Q + A

    Q. Visualize.java and Estimate.java to rename Percolation.java to PercolationEZ.java or whatever method we want to study seems to be a bother. Is there a way to avoid doing so?

    A. Yes, this is a key issue to be revisited in Chapter 3. In the meantime, you can keep the implementations in separate subdirectories and use the classpath, but that can get confusing. Advanced Java language mechanisms are also helpful, but they also have their own problems.

    Q. That recursive flow() method makes me nervous. How can I better understand what it's doing?

    A. Run it for small examples of your own making, instrumented with instructions to print a function call trace. After a few runs, you will gain confidence that it always fills the sites connected to the start point.

    Q.Is there a simple nonrecursive approach?

    A. There are several methods that perform the same basic computation. We will revisit the problem in Section 4.5. In the meantime, working on developing a nonrecursive implementation of flow() is certain to be an instructive exercise, if you are interested.

    Q. PercPlot.java seems to involve a huge amount of calculation to get a simple function graph. Is there some better way?

    A. Well, the best would be a mathematical proof of the threshold value, but that derivation has eluded scientists.

    Q. What is the percolation threshold for different lattices?

    A. percolation threshold probability p* such that if p < p* then no spanning cluster exists, and if p >= p* then one spanning cluster exists.

    Q. Any relevant papers?

    A. See A fast Monte Carlo algorithm for site or bond percolation by Newman and Ziff.

    Exercises

    1. Write a program that takes N from the command line and creates an N- by-N matrix with the entry in row i and column j set to true if i and j are relatively prime, then shows the matrix on the standard drawing (see Exercise 1.4.13). Then, write a similar program to draw the Hadamard matrix of order N (see Exercise 1.4.25) and the matrix such with the entry in row N and column j set to true if the coefficient of x^k in (1+x)^N (binomial coefficient) is odd (see Exercise 1.4.322). You may be surprised at the pattern formed by the latter.
    2. Implement a print() method for Percolation.java that prints 1 for blocked sites, 0 for open sites, and * for full sites.
    3. Give the recursive calls for Percolation.java given the following input:
      33
      1 0 1
      0 0 0
      1 1 0
      
    4. Write a client of Percolation.java like Visualize.java that does a series of experiments for a value of N taken from the command line where the site vacancy probability p increases from 0 to 1 by a given increment (also taken from the command line).
    5. Create a program PercolationDirected.java that test for directed percolation (by leaving off the last recursive call in the recursive show() method in Percolation.java, as described in the text), then use PercPlot.java to draw a plot of the directed percolation probability as a function of the site vacancy probability.
    6. Write a client of Percolation.java and PercolationDirected.java that takes a site vacancy probability p from the command line and prints an estimate of the probability that a system percolates but does not percolate down. Use enough experiments to get an estimate that is accurate to three decimal places.
    7. Describe the order in which the sites are marked when Percolation is used on a system with no blocked sites. Which is the last site marked? What is the depth of the recursion?
    8. Experiment with using PercPlot.java to plot various mathematical functions (just by replacing the call on Estimate.eval() with an expression that evaluates the function). Try the function sin(x) + cos(10*x) to see how the plot adapts to an oscillating curve, and come up with interesting plots for three or four functions of your own choosing.
    9. Modify Percolation.java to animate the flow computation, showing the sites filling one by one. Check your answer to the previous exercise.
    10. Modify Percolation.java to compute that maximum depth of the recursion used in the flow calculation. Plot the expected value of that quantity as a function of the site vacancy probability p. How does your answer change if the order of the recursive calls is reversed?
    11. Modify Estimate.java to produce output like that produced by Bernoulli.java. Extra credit: Use your program to validate the hypothesis that the data obeys the Gaussian (normal) distribution.

    Creative Exercises

    1. Vertical percolation. Show that a system with site vacancy probability p vertically percolates with probability 1 - (1 - p^N)^N, and use Estimate.java to validate your analysis for various values of N.
    2. Rectangular percolation systems. Modify the code in this section to allow you to study percolation in rectangular systems. Compare the percolation probability plots of systems whose ratio of width to height is 2 to 1 with those whose ratio is 1 to 2.
    3. Adaptive plotting. Modify PercPlot.java to take its control parameters (gap tolerance, error tolerance, and number of trials) from the command line. Experiment with various values of the parameters to learn their effect on the quality of the curve and the cost of computing it. Briefly describe your findings.
    4. Percolation threshold. Write a Percolation.java client that uses binary search to estimate the threshold value (see Exercise 2.1.29).
    5. Nonrecursive directed percolation. Write a nonrecursive program PercolationDirectedNonrecursive.java that tests for directed percolation by moving from top to bottom as in our vertical percolation code. Base your solution on the following computation: if any site in a contigu- ous subrow of open sites in the current row is connected to some full site on the previous row, then all of the sites in the subrow become full.
      directed percolation
    6. Fast percolation test. Modify the recursive flow() method in Percolation.java so that it returns as soon as it finds a site on the bottom row (and fills no more sites). Hint: Use an argument done that is true if the bottom has been hit, false otherwise. Give a rough estimate of the performance improvement factor for this change when running PercPlot.java Use values of N for which the programs run at least a few seconds but not more than a few minutes. Note that the improvement is ineffective unless the first recursive call in flow() is for the site below the current site.
    7. Bond percolation. Write a modular program for studying percolation under the assumption that the edges of the grid provide connectivity. That is, an edge can be either empty or full, and a system percolates if there is a path consisting of full edges that goes from top to bottom. Note: This problem has been solved analytically, so your simulations should validate the hypothesis that the percolation threshold approaches 1/2 as N gets large.
    8. Percolation in three dimensions. Implement a class Percolation3D and a class BooleanMatrix3D (for I/O and random generation) to study percolation in three-dimensional cubes, generalizing the two-dimensional case studied in this chapter. A percolation system is an N-by-N-by-N cube of sites that are unit cubes, each open with probability p and blocked with probability 1-p. Paths can connect an open cube with any open cube that shares a common face (one of six neighbors, except on the boundary). The system percolates if there exists a path connecting any open site on the bottom plane to any open site on the top plane. Use a recursive version of flow() like Program 2.4.5, but with eight recursive calls instead of four. Plot percolation probability versus site vacancy probability for as large a value of N as you can. Be sure to develop your solution incrementally, as emphasized throughout this section.
    9. Bond percolation on a triangular grid. Write a modular program for studying bond percolation on a triangular grid, where the system is composed of 2N^2 equilateral triangles packed together in an N-by-N grid of rhombus shapes. Each interior point has six bonds; each point on the edge has four; and each corner point has two.
    10. Game of life. Implement a class Life that simulates Conway's game of life. Consider a boolean matrix corresponding to a system of cells that we refer to as being either live or dead. The game consists of checking and perhaps updating the value of each cell, depending on the values of its neighbors (the adjacent cells in every direction, including diagonals). Live cells remain live and dead cells remain dead, with the following exceptions:

      • A dead cell with exactly three live neighbors becomes live.

      • A live cell with exactly one live neighbor becomes dead.

      • A live cell with more than three live neighbors becomes dead.

      Initialize with a random matrix, or use one of the starting patterns This game has been heavily studied, and relates to foundations of computer science.

    Web Exercises

    1. 2-by-2 percolation. Verify that the probability that a 2-by-2 system percolates is p^2 (2 - p^2), where p is the probability that a site is open.
    2. Cubic curves. Use recursive subdivision algorithm to plot cubic curves.
    3. Random walk. Term coined by Hungarian mathematician George Polya in a 1921 paper. Start at 0, go left with probability 1/2, go right with probability 1/2. Reflecting barrier at 0 - if particle hits 0, it must switch direction at next step and return to 1. Absorbing barrier at N - particle stops when it hits state N. Estimate number of steps as a function of N until the particle is absorbed.

      Analytical solution: N2.

    4. 3d random walk. Take a random walk on a 3d lattice, starting from (0, 0, 0). Write a program Polya.java to estimate the chance that you return to the origin after at most some number of steps, say 1,000. (In 1d and 2d, you will surely return; in 3d, there is a less than 50% chance.) Repeat the exercise on a 4d lattice. Solution: the actual probability (without the artificial restriction on the number of steps) is known as Polya's random walk constant. It is slightly more than 1/3 for 3D and slightly less than 1/5 for 4D.
    5. Self-avoiding random walk. Simulate random walk on lattice until it intersects. Among all self-avoiding walks (SAW) of length N, what is average distance from origin? To save time in the simulation, exclude SAWs that ever take a step backwards. How long until at least one SAW of length N = 40, N = 80? What is half-life of a SAW? To save more time in the simulation, allow a SAW only to take a step into an unoccupied cell (and repeat until it gets trapped). Practically nothing is known rigorously about these questions, so simulation is the best recourse. Here's an article from the New Scientist.

      Famous 1949 problem of Nobel prize winning chemist, Flory. Conjecture: exponent of root mean square displacement is 3/4 in 2D and 3/5 in 3D.

    6. Self-avoiding random walk. Write a program SelfAvoidingWalk.java to simulate and animate a 2D self-avoiding random walk. Self-avoiding random walks arise in modeling physical processes like the folding of polymer molecules. Such walks are difficult to model using classical mathematics. so they are best studied by direct numerical simulation. See what fraction of such random walks end up more than R^2 (say 30) from the starting point.

      Or keep a trail of length N.