# 3.5   Case Study:   Purple America

This section under major construction.

## Layers of abstraction.

When specifying the set of values for a Java class, we can include data members which are either primitive types or reference types. This enables us to build objects from other objects, and build layers of abstractions. In literature, we have letters, words, sentences, paragraphs, chapters, and books. In music we have notes, chords, measures, phrases, parts (musical event for one performer), movements, symphonies. Each note encapsulates its pitch, dynamic, and duration. In chemistry we have atoms, molecules, compounds. In a poker game, we have cards, the deck, hands, the dealer, and players. Classic example = networking (http, tcp, ip, etc.) Use abstraction to understand and decompose complex systems.

Modules. Decompose problem into individual re-usable data types. Each data type has its own well-specified interface, which makes it suitable for use as a black-box component in more complex system. Composition: combine data types to build more complex data types. Intuition: Although commercial aircraft are sold by one company, no company manufactures all of the parts and components to build it. The airframe, engine, and avionics are all made by different companies. In order for them to interact properly, a system architect must completely spec out the interfaces of each part. Commodity PCs work the same way: video card, Ethernet adapter, mouse, monitor, operating system are typically made by different companies.

Modules in Java. In Section 3.2 we built user-defined types from primitive types, strings, and arrays. In this section we will also build user-defined data types from other user-defined data types. Offers many advantages for building software systems.

• Work flow. Team of programmers can each work on one data type in parallel.
• Testing. Each data type can be tested and debugged independently.
• Code reuse. Reuse the data types in other applications.

## Computational geometry.

We create a number of data types for geometric objects in the plane. Although we will only introduce a tiny set of computational geometry primitives, they arise in numerous applications, including computer graphics (movies, games, virtual reality), computer vision, data mining, VLSI design, mathematical models of the physical world (maps, architecture, medical imaging), astronomical simulations, scientific computing, and and geographic information systems.
• Points. Program Point.java is a data type for points in the plane. Each Point has real-valued x and y coordinates. It also includes a toString method for debugging, a draw method for plotting, and a distanceTo(Point q) that returns the Euclidean distance between the invoking point p and q.
• Circles. Program Circle.java is a data type for circles. We represent a circle by its center (a Point) and its radius (a double). It contains methods for computing the area and perimeter, determining whether a Point is inside the circle, and determining whether two circles intersect.
• Polygons. Polygons are a useful mathematical abstraction for representing the shapes of real-world objects (letters for optical character recognition, obstacles to be avoided by a robot, US congressional district, piece of an airfoil). Program Polygon.java is a data type for closed polygons. A polygon is a sequence of points p0 through pN, where we assume the first point is equal to the last point, i.e., the polygon is closed. The perimeter is the sum of the Euclidean distances of the N line segments. The signed area of a non self-intersecting polygon is given by

The polygon is counterclockwise if the area is positive. We assume the points are indexed 0 through N-1 and that point N equals point 0.

To draw a shaded polygon on screen, we need to know whether each pixel (i, j) is inside or outside the polygon. Jordan curve theorem implies any polygon partitions plane into a finite inside region and an infinite outside region. Solution (ray crossing method): draw a ray from (i, j) down and count the number of times it crosses the polygon (even = outside, odd = inside). Intuition: think of polygon as a fence and count # times you jump over the fence.

Reference: David Eppstein.

Application: determine which diabetes tests are the best predictors of the disease.

• Geometric transformations. Transform polygons by transforming vertices. Use homogeneous coordinates. Euclidean transformations: rotation, translations. Preserve length and angle measure -> lines map to lines, planes to planes and circles to circles.

Affine transformations: scaling (isotropic and anisotropic), shearing, reflection. Shear = rotate + scale = composite transform. Scale = diagonal matrix. Rotate = orthogonal matrix. Rotation = positive counter-clockwise. (consistent with right hand rule.) Parallel lines map to parallel lines, intersecting lines transform to intersecting lines. Rotate about arbitrary point = translate point to origin, rotate, translate back.

Operations: compose(), invert(), apply(). Wikipedia reference. Stanford 3D graphics repository.

## Purple America.

(as created and popularized by Robert J. Vanderbei)   We compose a program to read in countywide election data and visualize the results.

• Data sources. We collect state and county border data from the US Census TIGER database. [ documentation ] The file USA.txt contains each state (in the Continental US) and its corresponding polygon(s). There are 104 polygons in total. The files NJ.txt contain each county in the given state. There are 3325 polygons in total over all states.

We collect election result data from uselectionatlas.org. The URL for a given state and election year is of the form http://uselectionatlas.org/RESULTS/datagraph.php?year=2004&fips=34, where 2004 is replaced by the desired year and 34 (New Jersey) is replaced by the desired FIPS code. The helper file codes.txt contains a list of states, USPS abbreviations, and FIPS codes. The program ElectionScraper.java takes the year as a command line parameter and screen scrapes the election results into 51 data files of the form NJ2004.txt.

For reference, National Atlas contains nice maps of each Congressional district (109th Congress, January 2005 - 2007).

• Continental US. Warmup: plot a map containing each of the 48 states in the continental US using the file USA.txt, which contains each state and its corresponding polygons.

Program Region.java is a data type that represents a US state or county.

% java Region < USA.txt

• US counties. Warmup: plot a map containing all counties in the continental US. See 51stbna for countywide data.
% java Region < USA-county.txt

• Red and blue states. Visualization where Republican states are colored red and Democratic states are colored blue. The data type VoteTally.java records the election results for a given region. Here is a map for the 2004 presidential election.

• Purple states. Red and blue state visualization provides a polarizing view of election results. In reality, the red states are not so red and the blue states are not so blue. Robert J. Vanderbei proposed and popularized a visualization where each region is colored as a mix of red, green, and blue, according to the proportion of votes for each candidate. He named the visualization Purple America. More formally, the RGB color = (p1, p2, p3) where p1 is the fraction of votes for the Republican candidate, p2 is the fraction of votes for Independent candidates, and p3 is the fraction of votes for the Democratic candidate.

Program ElectionMap.java takes the name of a map and election year as a command-line arguments and plots each region in a shade of purple, according to the US Presidential election results for that year.

% java ElectionMap USA 2004

• Purple counties. We repeat the same visualization, but now on a county-by-county basis.
% java ElectionMap USA-county.txt 2004

Program InteractiveElectionMap.java adds user input - when the user clicks a region, the program outputs the name of the region and the vote tallies for that region.

## A card game.

We will create a card game application that shuffles and deals out a deck of 52 cards to four players, as in bridge or hearts. We divide up our problem by creating objects for each of the main components in our card game. The physical game has playing cards (Queen of spades, four of clubs, etc), a deck (of 52 cards), and four players (each of which is dealt 13 cards). We build up our program in small modules by creating data types for each of these entities.

Program Card.java implements a data type for playing cards. Each object represents one of the 52 cards in a deck and supports the following operations: toString, less, and draw. The less method compares two cards based on their suit, then on their rank. It is useful to sort a hand of cards by suit, then by rank, as you would do in the card games bridge or hearts. We use StdDraw.java to display graphical representations of the playing cards. The archive file cards.jar contains a library of GIF images of playing cards. (As per the GPL license, here is the original source of the images). The images named 0.gif, 1.gif, through 51.gif are GIF images of the two of clubs, three of clubs, through the Ace of spades. The file back.gif represents the reverse side of the playing card. To use the library, put the file in your current directory, compile as usual, and execute with

 java -classpath .:cards.jar Card.java 

On Windows, replace the colon with a semicolon.

The data type Deck.java is a data type to represent a deck of 52 playing cards. The constructor initializes a new deck of 52 playing cards. The method shuffle rearrange the cards in random order using the algorithm from Section 2.5. The method dealFrom deletes one card from the deck and returns it. The method isEmpty is useful to check if the deck is empty. As an example, if we execute the following code

 Deck deck = new Deck(); System.out.println(deck); deck.shuffle(); System.out.println(deck); System.out.println(deck.dealFrom()); System.out.println(deck); 

then we might get the following output

 Deck 2C 3C 4C 5C 6C 7C 8C 9C ... JS QS KS AS Deck 4H 7H JH 9D 5D JD KC 3D ... 5H TC 7S 2S 4H Deck 7H JH 9D 5D JD KC 3D ... 5H TC 7S 2S 

The data type Player.java is a data type for representing a player in a card game of bridge or hearts. A player consists of a name, a location (for drawing), and a hand of cards. The method dealTo takes a Card as an argument and inserts that card into the player's hand. The internal method sort() is used to sort the cards by suit and rank, before converting to a string representation for use with toString or plotting using standard draw.

The data type Game.java represents the beginning of a card game application. It creates a deck of cards and four players (North, East, South, and West). It shuffles the 52 cards and deals them out to the four players. The cards for each player are printed to standard output

 North 4C 6C 7C TC 5D 5H 8H 3S 4S 5S 8S TS KS East 3C 8C QC 3D 4D 8D TD JD TH QH AH 2S QS South 5C JC KC AC QD KD 2H 3H 4H 7H JH KH AS West 2C 9C 2D 6D 7D 9D AD 6H 9H 6S 7S 9S JS 

and then plotting using standard draw. Note: picture below does not correspond to hands above, and it is sorted in "lefty" order.

Here's how you might do it Java 1.5 with enum

#### Q + A

Q. If I have a large project, do I need to javac each .java file separately?

A. Not necessarily. If A.java depends on B.java and B.java depends on C.java and all three are out of date (date of .java after date of .class file) , then compiling A.java will automatically recompile B.java and C.java. On the other hand, if A.java and C.java are out of date, but not B.java, then C.java will not get automatically recompiled when you compile A.java.

#### Exercises

1. Write a program that generates N circles at random in the unit box of diameter 0.05, draws them to the screen, and highlights all those pairs that overlap.
2. What does the following code fragment print out?
 public void mystery(Point p, Point q) { p.x = 17; p.y = 17; Point temp = p; p = q; q = p; } public static void main(String [] args) { Point a = new Point(0, 0); Point b = new Point(0, 0); System.out.println("a = " + a); System.out.println("b = " + b); System.out.println(); mystery(a, b); System.out.println("a = " + a); System.out.println("b = " + b); System.out.println(); } 
Answer: it makes a reference the point (17, 17), but does not swap a and b.
3. Create a data type LineSegment that represents a line segment in the plane. Represent a line segment using two Point instance variables.
4. Add a method centroid() to Polygon that returns the centroid as a Point. The centroid (or center of mass) of a non self-intersecting polygon is given by the point (x, y) where x and y are defined by

5. Create a data type Line that represents a line in the plane. Represent a line as a Point and a slope.
6. Write a method distanceTo() that computes the shortest (Euclidean) distance from a point (x, y) to a Line from (x1, y1) to (x2, y2). The formula is

Note that the denominator is the distance between the two points on the line so you should reuse the distanceTo() method in Point. The numerator is signed, so this returns the signed distance - you may wish to take absolute values.

7. Write a method intersects() so that s1.intersects(s2) returns true if the two line segments intersect and false otherwise. To determine if two line segments a-b and c-d intersect, compute

If 0 ≤ r ≤ 1 and 0 ≤ s &le 1, then the line segments intersect. Moreover, the point of intersection is (x, y) = (ax + r(bx-ax), ay + r(by-ay)). If the denominator in equation 1 is zero, a-b and c-d are parallel. If the numerator in equation 1 is also zero, then a-b and c-d are collinear.

8. Compute the area of Texas using the method in Polygon. Try to convert to square miles - the coordinates of the Texas polygon in USA.txt are given as latitude / longitude pairs.
9. Add a method isSelfIntersecting() to Polygon that returns true if the polygon intersects itself, and false otherwise. Use the line segment intersection routine from the previous exercise and test all N choose 2 line segments.
10. Create a static method ccw() that takes three Point inputs a, b, and c and returns +1 if a-b-c is a counterclockwise angle, -1 if a-b-c is a clockwise angle, and 0 if a-b-c are collinear. Use the following determinant formula which gives twice the (signed) area of the triangle a-b-c. If the area is positive, then a-b-c is counterclockwise; if the area is negative, then a-b-c is counterclockwise; if the area is zero then a-b-c are collinear.

11. Interval. Create a data type Interval.java that represents a closed interval [a, b] on the real line. Include methods for returning the length of the interval, checking whether two intervals intersect, returning a new Interval that is the intersection of two intervals (largest interval contained in both intervals), and returning a new Interval that is the union of two intervals (smallest interval that contains both intervals).
12. Bounding box (2D interval). Create a data type BoundingBox.java that represents a bounding box. A bounding box is an axis-aligned rectangle. In addition to a constructor, it should have a method union() so that a.union(b) initializes returns a new bounding box that is the smallest bounding box that contains both a and b. Add methods to determine whether a Point is inside a bounding box or whether two bounding boxes intersect, area, perimeter. Include a constructor should take an array of points, and return the smallest bounding box containing each point.

Hint: implement it using two Interval instance variables, one for the x-axis and one for the y-axis.

13. Create a data type Triangle that represents a triangle. It should have a constructor which take references to three Point objects as inputs and create a triangle with those endpoints. It should have methods area(), perimeter(), isIsosceles(), isRight(), isEquilateral(), and isScalene().
14. Write a client program Pi.java that estimates the mathematical constant π using Monte Carlo simulation. Create a circle centered at (1/2, 1/2) of radius 1/2. Then repeatedly generate random points in the unit square and check if they are inside the circle. Print out the fraction of points that fall inside the circle. The true answer estimates π / 4 since the ratio of the area of the circle to the area of the square is π / 4.
15. Modify Card.java to add a more user-friendly constructor so the following code works
 Card c = new Card(Card.KING, Card.HEARTS); 

Add classwide variables to Card to achieve this effect.

16. Write a program to print the 52 cards in sorted order and then again in shuffled order, as in the picture below. Add a helper method draw to Deck.java.

#### Creative Exercises

1. Bullseye. Create several concentric circles and associate a value with each circle. Generate a point at random (using some distribution). The number of points you score is the value of the smallest enclosing circle.
2. Random sphere packing. Generate random disks of diameter d in the unit square and repeatedly add each new one as long as it doesn't intersect an existing disk. Stop trying when 100 attempts in-a-row fail. How many disks on average are placed? Use the Circle.java data type.
3. 2-point correlation function. Given N points and a parameter r, the 2-point correlation function is the number of pairs of points that lie within distance r of each other. The 2-point correlation function roughly measures the clumpiness of a set of points. It is a fundamentally important statistic in astrophysics. Write a program that takes a command line parameter N, reads in N points, and calculates the number of points within a radius of r of each other for various values of r.
4. Traveling salesperson problem. Write a program that takes a command line input N, generates N random Point objects in the unit square, a finds the shortest tour that visits each point exactly once. Base your algorithm on Program Permutations.java which generates all N! permutations on N elements. What is the biggest value of N for which your program finds an answer is less than 5 minutes?
5. Rational roots. If a polynomial with integer coefficients has a rational root p/q (reduced) and a0 != 0, then p is a divisor of a0 and q is a divisor of a_n. Can enumerate all possibilities by factoring.
6. Bridge. One of the best ways to determine the strength of a bridge hand is to count up the number of high cards. The Milton Work Point Count assigns a numerical score to each hand by adding 4 points for each Ace, 3 for each King, 2 for each Queen, and 1 for each Jack. Deal out a million hands a compute a histogram of the number of points.

Answer: Here is the exact distribution of each of the high card point counts. Use the helper data type Histogram.java, which relies on Draw.java.

7. Bridge. It is advantageous in bridge to have suits with a large or small number of cards. Modify the program in the previous exercise to assign a bonus of 2 for each suit in which the hand has no cards of the given suit (void) and a bonus of 1 for each suit in which the hand has only a single card (singleton). Name your program BridgeExperiment.java. Use Histogram.java to create an animated histogram of the results.
8. Crazy 8s. In the card game Crazy 8s, each player typically sorts their cards according to rank (instead of suit as in Bridge). Modify the less method in Card so that the sort method in Player sorts in the appropriate order.
 public boolean less(Card other) { return rank < other.rank; } 
9. I Doubt It. I Doubt It is a multiplayer card game in which it is convenient for each player to sort their cards in an interesting order...
10. Concentration. Implement the card game concentration.
11. Texas Hold 'Em. Each player is dealt two cards. The two players share five community cards and try to make the best hand from the 7 cards. Given the hole cards of two players, determine the probability that each will win after the community cards are revealed.
12. Texas Hold' Em intransitivity. Find two card hands so that A beats B, B beats C, and C beats A, where "beats" means that you have a higher probability of winning when the community cards are revealed.

Solution: AS TD, QH JH, 2C 2S. AS TD beats QH JH roughly 57%, QH JH beats 2C 2S roughly 53%, 2C 2S beats AS TD roughly 50%.

13. k-means clustering. Cluster objects (in a vector space) into k partitions.
• Partition objects at random into k sets.
• Define the k centers as the centroids of each set.
• Assign each object to its nearest center.
• Repeat the last two steps until the process converges (the assignment doesn't change).
Fast convergence in practice. Convergence is guaranteed in theory, but can be exponentially slow. No guarantee of global optimum. Can repeat several times.

Here's a visual application for clustering colors in an image. Cluster pixels in an image according to k colors, e.g., to reduce the number of colors in a picture from thousands to 32.

14. Zip code lookup. Given a zip code, identify it on a map. The file zips2000.csv from the Tiger database contains a list of US zipcodes followed by the longitude and latitude of their centroid.
15. Albers equal-area conic projection. The Albers equal-area conic projection maps latitude and longitude (φ, λ) to Catesian coordinates (x, y). It gives it the curvy surface we are accustomed to seeing. Used by the U.S. Geological Survey.

This code is slightly adapted from Visualizing Data by Ben Fry.

 double phi0 = 0.0; double lambda0 = Math.toRadians(-96.0); // central meridan = 96 W double phi1 = Math.toRadians( 29.5); // standard parallel 1 = 29.5 N double phi2 = Math.toRadians( 45.5); // standard parallel 2 = 45.5 N double phi = Math.toRadians(latitude); double lambda = Math.toRadians(longitude); double n = 0.5 * (Math.sin(phi1) + Math.sin(phi2)); double theta = n * (lambda - lambda0); double C = (Math.cos(phi1) * Math.cos(phi1)) + (2 * n * Math.sin(phi1)); double rho = Math.sqrt(C - 2 * n * Math.sin(phi)) / n; double rho0 = Math.sqrt(C - 2 * n * Math.sin(phi0)) / n; double x = rho * Math.sin(theta); double y = rho0 - rho*Math.cos(theta); 
16. Azimuthal projection. Instead of plotting (latitude, longitude), use some kind of map projection, either conical or azimuthal (cylindrical is bad for large countries like US that have long east-west extent).

Recommended: Lambert conformal conical with center meridian at 95°W. Two parallels 33°N and 45°N.

Let LAT1 and LAT2 be two latitudes and let CLON be longitude. Compute cone constant (between 0 and 1). For US (with LAT1 = 33 and LAT2 = 45), the cone constant is .6304776973.

  CONE = COS(90 - LAT1) LOG (COS(LAT1)) - LOG (COS(LAT2)) CONE = ------------------------------------------------- // two latitudes LOG (TAN (45-LAT1/2)) - LOG (TAN (45-LAT2/2)) 
To project (RLAT, RLON), compute the formulas
  R = (TAN(45 - RLAT/2))**CONE U = R * SIN(CONE*(RLON-CLON)) V = -R * COS(CONE*(RLON-CLON)) 
reference.
17. Random point in a polygon. Given a simple polygon, design a method to choose a random point in its interior. Add a method to the polygon ADT that returns a bounding box. Then use the rejection method to find a point.
18. Dot plots. Instead of plotting the color of a polygon using a shade of purple, plot red, blue, and green dots in the region at random. Make each dot represent x hundred votes.
19. Poorest counties. Plot the 100 poorest counties in the US in blue on the county map and the 100 richest counties in the US in red. (Ignore counties in Alaska or Hawaii.)
20. Demographics. Use the county maps to create data visualizations for other demographics.