3.5 Case Study: Purple America
This section under major construction.
Layers of abstraction.
When specifying the set of values for a Java class, we can include data members which are either primitive types or reference types. This enables us to build objects from other objects, and build layers of abstractions. In literature, we have letters, words, sentences, paragraphs, chapters, and books. In music we have notes, chords, measures, phrases, parts (musical event for one performer), movements, symphonies. Each note encapsulates its pitch, dynamic, and duration. In chemistry we have atoms, molecules, compounds. In a poker game, we have cards, the deck, hands, the dealer, and players. Classic example = networking (http, tcp, ip, etc.) Use abstraction to understand and decompose complex systems.Modules. Decompose problem into individual reusable data types. Each data type has its own wellspecified interface, which makes it suitable for use as a blackbox component in more complex system. Composition: combine data types to build more complex data types. Intuition: Although commercial aircraft are sold by one company, no company manufactures all of the parts and components to build it. The airframe, engine, and avionics are all made by different companies. In order for them to interact properly, a system architect must completely spec out the interfaces of each part. Commodity PCs work the same way: video card, Ethernet adapter, mouse, monitor, operating system are typically made by different companies.
Modules in Java. In Section 3.2 we built userdefined types from primitive types, strings, and arrays. In this section we will also build userdefined data types from other userdefined data types. Offers many advantages for building software systems.
 Work flow. Team of programmers can each work on one data type in parallel.
 Testing. Each data type can be tested and debugged independently.
 Code reuse. Reuse the data types in other applications.
Computational geometry.
We create a number of data types for geometric objects in the plane. Although we will only introduce a tiny set of computational geometry primitives, they arise in numerous applications, including computer graphics (movies, games, virtual reality), computer vision, data mining, VLSI design, mathematical models of the physical world (maps, architecture, medical imaging), astronomical simulations, scientific computing, and and geographic information systems. Points. Program Point.java is a data type for points in the plane. Each Point has realvalued x and y coordinates. It also includes a toString method for debugging, a draw method for plotting, and a distanceTo(Point q) that returns the Euclidean distance between the invoking point p and q.
 Circles. Program Circle.java is a data type for circles. We represent a circle by its center (a Point) and its radius (a double). It contains methods for computing the area and perimeter, determining whether a Point is inside the circle, and determining whether two circles intersect.
 Polygons.
Polygons are a useful mathematical abstraction for representing the shapes of
realworld objects (letters for optical character recognition, obstacles to be
avoided by a robot, US congressional district, piece of an airfoil).
Program Polygon.java is a data type
for closed polygons.
A polygon is a sequence of points p_{0} through
p_{N}, where we assume the first point is equal to the
last point, i.e., the polygon is closed.
The perimeter is the sum of the Euclidean distances
of the N line segments.
The signed area of a non selfintersecting polygon is given by
To draw a shaded polygon on screen, we need to know whether each pixel (i, j) is inside or outside the polygon. Jordan curve theorem implies any polygon partitions plane into a finite inside region and an infinite outside region. Solution (ray crossing method): draw a ray from (i, j) down and count the number of times it crosses the polygon (even = outside, odd = inside). Intuition: think of polygon as a fence and count # times you jump over the fence.
Application: determine which diabetes tests are the best predictors of the disease.
 Geometric transformations.
Transform polygons by transforming vertices.
Use homogeneous coordinates.
Euclidean transformations: rotation, translations.
Preserve length and angle measure > lines map to lines,
planes to planes and circles to circles.
Affine transformations: scaling (isotropic and anisotropic), shearing, reflection. Shear = rotate + scale = composite transform. Scale = diagonal matrix. Rotate = orthogonal matrix. Rotation = positive counterclockwise. (consistent with right hand rule.) Parallel lines map to parallel lines, intersecting lines transform to intersecting lines. Rotate about arbitrary point = translate point to origin, rotate, translate back.
Operations: compose(), invert(), apply(). Wikipedia reference. Stanford 3D graphics repository.
Purple America.
(as created and popularized by Robert J. Vanderbei) We compose a program to read in countywide election data and visualize the results. Data sources.
We collect state and county border data from the
US Census TIGER database.
[ documentation ]
The file USA.txt contains each state
(in the Continental US) and its corresponding polygon(s). There are 104 polygons in total.
The files NJ.txt contain each county in the given state. There are 3325 polygons in
total over all states.
We collect election result data from uselectionatlas.org. The URL for a given state and election year is of the form http://uselectionatlas.org/RESULTS/datagraph.php?year=2004&fips=34, where 2004 is replaced by the desired year and 34 (New Jersey) is replaced by the desired FIPS code. The helper file codes.txt contains a list of states, USPS abbreviations, and FIPS codes. The program ElectionScraper.java takes the year as a command line parameter and screen scrapes the election results into 51 data files of the form NJ2004.txt.
For reference, National Atlas contains nice maps of each Congressional district (109th Congress, January 2005  2007).
 Continental US.
Warmup: plot a map containing each of the 48 states in the continental US
using the file USA.txt,
which contains each state and its corresponding polygons.
Program Region.java is a data type that represents a US state or county.
% java Region < USA.txt
 US counties.
Warmup: plot a map containing all counties in the continental US.
See 51stbna for countywide data.
% java Region < USAcounty.txt
 Red and blue states.
Visualization where Republican states are colored red and
Democratic states are colored blue.
The data type VoteTally.java
records the election results for a given region.
Here is a map for the 2004 presidential election.
 Purple states.
Red and blue state visualization provides a polarizing view of
election results. In reality, the red states are not so red and
the blue states are not so blue.
Robert J. Vanderbei proposed and popularized a visualization where
each region is colored as a mix of red, green, and blue, according
to the proportion of votes for each candidate.
He named the visualization
Purple America.
More formally, the RGB color = (p1, p2, p3) where p1 is
the fraction of votes for the Republican candidate, p2 is
the fraction of votes for Independent candidates,
and p3 is the fraction of votes for the Democratic candidate.
Program ElectionMap.java takes the name of a map and election year as a commandline arguments and plots each region in a shade of purple, according to the US Presidential election results for that year.
% java ElectionMap USA 2004
 Purple counties.
We repeat the same visualization, but now on a countybycounty basis.
% java ElectionMap USAcounty.txt 2004
Program InteractiveElectionMap.java adds user input  when the user clicks a region, the program outputs the name of the region and the vote tallies for that region.
A card game.
We will create a card game application that shuffles and deals out a deck of 52 cards to four players, as in bridge or hearts. We divide up our problem by creating objects for each of the main components in our card game. The physical game has playing cards (Queen of spades, four of clubs, etc), a deck (of 52 cards), and four players (each of which is dealt 13 cards). We build up our program in small modules by creating data types for each of these entities.Program Card.java implements a data type for playing cards. Each object represents one of the 52 cards in a deck and supports the following operations: toString, less, and draw. The less method compares two cards based on their suit, then on their rank. It is useful to sort a hand of cards by suit, then by rank, as you would do in the card games bridge or hearts. We use StdDraw.java to display graphical representations of the playing cards. The archive file cards.jar contains a library of GIF images of playing cards. (As per the GPL license, here is the original source of the images). The images named 0.gif, 1.gif, through 51.gif are GIF images of the two of clubs, three of clubs, through the Ace of spades. The file back.gif represents the reverse side of the playing card. To use the library, put the file in your current directory, compile as usual, and execute with
java classpath .:cards.jar Card.java
On Windows, replace the colon with a semicolon.
The data type Deck.java is a data type
to represent a deck of 52 playing cards.
The constructor initializes a new deck of 52 playing cards.
The method shuffle rearrange the cards in random order using
the algorithm from Section 2.5.
The method dealFrom deletes one card from the deck and returns it.
The method isEmpty is useful to check if the deck is empty.
As an example, if we execute the following code
Deck deck = new Deck(); System.out.println(deck); deck.shuffle(); System.out.println(deck); System.out.println(deck.dealFrom()); System.out.println(deck);
then we might get the following output
Deck 2C 3C 4C 5C 6C 7C 8C 9C ... JS QS KS AS Deck 4H 7H JH 9D 5D JD KC 3D ... 5H TC 7S 2S 4H Deck 7H JH 9D 5D JD KC 3D ... 5H TC 7S 2S
The data type Player.java is a data type for representing a player in a card game of bridge or hearts. A player consists of a name, a location (for drawing), and a hand of cards. The method dealTo takes a Card as an argument and inserts that card into the player's hand. The internal method sort() is used to sort the cards by suit and rank, before converting to a string representation for use with toString or plotting using standard draw.
The data type Game.java represents the beginning of a card game application. It creates a deck of cards and four players (North, East, South, and West). It shuffles the 52 cards and deals them out to the four players. The cards for each player are printed to standard output
North 4C 6C 7C TC 5D 5H 8H 3S 4S 5S 8S TS KS East 3C 8C QC 3D 4D 8D TD JD TH QH AH 2S QS South 5C JC KC AC QD KD 2H 3H 4H 7H JH KH AS West 2C 9C 2D 6D 7D 9D AD 6H 9H 6S 7S 9S JS
and then plotting using standard draw. Note: picture below does not correspond to hands above, and it is sorted in "lefty" order.
Here's how you might do it Java 1.5 with enum
Q + A
Q. If I have a large project, do I need to javac each .java file separately?
A. Not necessarily. If A.java depends on B.java and B.java depends on C.java and all three are out of date (date of .java after date of .class file) , then compiling A.java will automatically recompile B.java and C.java. On the other hand, if A.java and C.java are out of date, but not B.java, then C.java will not get automatically recompiled when you compile A.java.
Exercises
 Write a program that generates N circles at random in the unit box of diameter 0.05, draws them to the screen, and highlights all those pairs that overlap.
 What does the following code fragment print out?
public void mystery(Point p, Point q) { p.x = 17; p.y = 17; Point temp = p; p = q; q = p; } public static void main(String [] args) { Point a = new Point(0, 0); Point b = new Point(0, 0); System.out.println("a = " + a); System.out.println("b = " + b); System.out.println(); mystery(a, b); System.out.println("a = " + a); System.out.println("b = " + b); System.out.println(); }
 Create a data type LineSegment that represents a line segment in the plane. Represent a line segment using two Point instance variables.

Add a method centroid() to Polygon that returns
the centroid as a Point. The centroid
(or center of mass) of a non selfintersecting polygon is given by
the point (x, y) where x and y are defined by
 Create a data type Line that represents a line in the plane. Represent a line as a Point and a slope.

Write a method distanceTo() that computes the shortest
(Euclidean) distance from a point (x, y) to a Line
from (x1, y1) to (x2, y2). The formula is
Note that the denominator is the distance between the two points on the line so you should reuse the distanceTo() method in Point. The numerator is signed, so this returns the signed distance  you may wish to take absolute values.

Write a method intersects() so that s1.intersects(s2)
returns true if the two line segments intersect and false
otherwise. To determine if two line segments ab and cd intersect, compute
If 0 ≤ r ≤ 1 and 0 ≤ s &le 1, then the line segments intersect. Moreover, the point of intersection is (x, y) = (a_{x} + r(b_{x}a_{x}), a_{y} + r(b_{y}a_{y})). If the denominator in equation 1 is zero, ab and cd are parallel. If the numerator in equation 1 is also zero, then ab and cd are collinear.
 Compute the area of Texas using the method in Polygon. Try to convert to square miles  the coordinates of the Texas polygon in USA.txt are given as latitude / longitude pairs.
 Add a method isSelfIntersecting() to Polygon that returns true if the polygon intersects itself, and false otherwise. Use the line segment intersection routine from the previous exercise and test all N choose 2 line segments.

Create a static method ccw() that takes three Point inputs
a, b, and c and returns +1 if abc is a counterclockwise angle,
1 if abc is a clockwise angle, and 0 if abc are collinear.
Use the following determinant formula which gives twice the (signed)
area of the triangle abc. If the area is positive, then abc is
counterclockwise; if the area is negative, then abc is counterclockwise;
if the area is zero then abc are collinear.
 Interval. Create a data type Interval.java that represents a closed interval [a, b] on the real line. Include methods for returning the length of the interval, checking whether two intervals intersect, returning a new Interval that is the intersection of two intervals (largest interval contained in both intervals), and returning a new Interval that is the union of two intervals (smallest interval that contains both intervals).
 Bounding box (2D interval).
Create a data type BoundingBox.java
that represents a bounding box. A bounding box is an axisaligned
rectangle. In addition to a constructor, it should have a method union()
so that a.union(b) initializes returns a new bounding box that is the
smallest bounding box that contains both a and b.
Add methods to determine whether a Point is inside a bounding box
or whether two bounding boxes intersect, area, perimeter.
Include a constructor should take an array of points, and return the smallest
bounding box containing each point.
Hint: implement it using two Interval instance variables, one for the xaxis and one for the yaxis.
 Create a data type Triangle that represents a triangle. It should have a constructor which take references to three Point objects as inputs and create a triangle with those endpoints. It should have methods area(), perimeter(), isIsosceles(), isRight(), isEquilateral(), and isScalene().
 Write a client program Pi.java that estimates the mathematical constant π using Monte Carlo simulation. Create a circle centered at (1/2, 1/2) of radius 1/2. Then repeatedly generate random points in the unit square and check if they are inside the circle. Print out the fraction of points that fall inside the circle. The true answer estimates π / 4 since the ratio of the area of the circle to the area of the square is π / 4.
 Modify Card.java to add a more userfriendly
constructor so the following code works
Card c = new Card(Card.KING, Card.HEARTS);
Add classwide variables to Card to achieve this effect.

Write a program to print the 52 cards in sorted order and then
again in shuffled order, as in the picture below.
Add a helper method draw to Deck.java.
Creative Exercises
 Bullseye. Create several concentric circles and associate a value with each circle. Generate a point at random (using some distribution). The number of points you score is the value of the smallest enclosing circle.
 Random sphere packing. Generate random disks of diameter d in the unit square and repeatedly add each new one as long as it doesn't intersect an existing disk. Stop trying when 100 attempts inarow fail. How many disks on average are placed? Use the Circle.java data type.
 2point correlation function. Given N points and a parameter r, the 2point correlation function is the number of pairs of points that lie within distance r of each other. The 2point correlation function roughly measures the clumpiness of a set of points. It is a fundamentally important statistic in astrophysics. Write a program that takes a command line parameter N, reads in N points, and calculates the number of points within a radius of r of each other for various values of r.
 Traveling salesperson problem. Write a program that takes a command line input N, generates N random Point objects in the unit square, a finds the shortest tour that visits each point exactly once. Base your algorithm on Program Permutations.java which generates all N! permutations on N elements. What is the biggest value of N for which your program finds an answer is less than 5 minutes?
 Rational roots. If a polynomial with integer coefficients has a rational root p/q (reduced) and a0 != 0, then p is a divisor of a0 and q is a divisor of a_n. Can enumerate all possibilities by factoring.
 Bridge.
One of the best ways to determine the strength of a bridge hand is
to count up the number of high cards.
The Milton Work Point Count assigns a numerical score
to each hand by adding 4 points for each Ace, 3 for each King,
2 for each Queen, and 1 for each Jack.
Deal out a million hands a compute a histogram of the number of points.
Answer: Here is the exact distribution of each of the high card point counts. Use the helper data type Histogram.java, which relies on Draw.java.
 Bridge. It is advantageous in bridge to have suits with a large or small number of cards. Modify the program in the previous exercise to assign a bonus of 2 for each suit in which the hand has no cards of the given suit (void) and a bonus of 1 for each suit in which the hand has only a single card (singleton). Name your program BridgeExperiment.java. Use Histogram.java to create an animated histogram of the results.

Crazy 8s.
In the card game Crazy 8s, each player typically sorts their cards
according to rank (instead of suit as in Bridge). Modify
the less method in Card so that the sort
method in Player sorts in the appropriate order.
public boolean less(Card other) { return rank < other.rank; }
 I Doubt It. I Doubt It is a multiplayer card game in which it is convenient for each player to sort their cards in an interesting order...
 Concentration. Implement the card game concentration.
 Texas Hold 'Em. Each player is dealt two cards. The two players share five community cards and try to make the best hand from the 7 cards. Given the hole cards of two players, determine the probability that each will win after the community cards are revealed.
 Texas Hold' Em intransitivity.
Find two card hands so that A beats B, B beats C,
and C beats A, where "beats" means that you have a higher
probability of winning when the community cards are
revealed.
Solution: AS TD, QH JH, 2C 2S. AS TD beats QH JH roughly 57%, QH JH beats 2C 2S roughly 53%, 2C 2S beats AS TD roughly 50%.
 kmeans clustering.
Cluster objects (in a vector space) into k partitions.
 Partition objects at random into k sets.
 Define the k centers as the centroids of each set.
 Assign each object to its nearest center.
 Repeat the last two steps until the process converges (the assignment doesn't change).
Here's a visual application for clustering colors in an image. Cluster pixels in an image according to k colors, e.g., to reduce the number of colors in a picture from thousands to 32.
 Zip code lookup. Given a zip code, identify it on a map. The file zips2000.csv from the Tiger database contains a list of US zipcodes followed by the longitude and latitude of their centroid.
 Albers equalarea conic projection.
The Albers
equalarea conic projection maps latitude and longitude (φ, λ) to
Catesian coordinates (x, y). It gives it the curvy surface we are accustomed to seeing.
Used by the U.S.
Geological Survey.
This code is slightly adapted from Visualizing Data by Ben Fry.
double phi0 = 0.0; double lambda0 = Math.toRadians(96.0); // central meridan = 96 W double phi1 = Math.toRadians( 29.5); // standard parallel 1 = 29.5 N double phi2 = Math.toRadians( 45.5); // standard parallel 2 = 45.5 N double phi = Math.toRadians(latitude); double lambda = Math.toRadians(longitude); double n = 0.5 * (Math.sin(phi1) + Math.sin(phi2)); double theta = n * (lambda  lambda0); double C = (Math.cos(phi1) * Math.cos(phi1)) + (2 * n * Math.sin(phi1)); double rho = Math.sqrt(C  2 * n * Math.sin(phi)) / n; double rho0 = Math.sqrt(C  2 * n * Math.sin(phi0)) / n; double x = rho * Math.sin(theta); double y = rho0  rho*Math.cos(theta);
 Azimuthal projection.
Instead of plotting (latitude, longitude), use some kind of
map projection,
either conical or azimuthal (cylindrical is bad for large countries like US that have
long eastwest extent).
Recommended: Lambert conformal conical with center meridian at 95°W. Two parallels 33°N and 45°N.
Let LAT1 and LAT2 be two latitudes and let CLON be longitude. Compute cone constant (between 0 and 1). For US (with LAT1 = 33 and LAT2 = 45), the cone constant is .6304776973.
CONE = COS(90  LAT1) LOG (COS(LAT1))  LOG (COS(LAT2)) CONE =  // two latitudes LOG (TAN (45LAT1/2))  LOG (TAN (45LAT2/2))
R = (TAN(45  RLAT/2))**CONE U = R * SIN(CONE*(RLONCLON)) V = R * COS(CONE*(RLONCLON))
 Random point in a polygon. Given a simple polygon, design a method to choose a random point in its interior. Add a method to the polygon ADT that returns a bounding box. Then use the rejection method to find a point.
 Dot plots. Instead of plotting the color of a polygon using a shade of purple, plot red, blue, and green dots in the region at random. Make each dot represent x hundred votes.
 Poorest counties. Plot the 100 poorest counties in the US in blue on the county map and the 100 richest counties in the US in red. (Ignore counties in Alaska or Hawaii.)
 Demographics. Use the county maps to create data visualizations for other demographics.