3.3   Designing Data Types


In this section we discuss encapsulation, immutability, and inheritance, with particular attention to the use of these mechanisms in data-type design to enable modular programming, facilitate debugging, and write clear and correct code.

Encapsulation.

The process of separating clients from implementations by hiding information is known as encapsulation. We use encapsulation to enable modular programming, facilitate debugging, and clarify program code.

Immutability.

Immutable and mutable data types An object from a data type is immutable if its data-type value cannot change once created. An immutable data type is one in which all objects of that type are immutable.

Spatial vectors.

Vector A spatial vector is an abstract entity that has a magnitude and a direction. A sequence of n real numbers suffices to specify a vector in n-dimensional space. We use a boldface letter like \( \boldsymbol{x} \) to denote the vector \( ( x_0, x_1, \; \ldots, \; x_{n-1}) \).

Interface inheritance (subtyping).

Java provides the interface construct for declaring a relationship between otherwise unrelated classes, by specifying a common set of methods that each implementing class must include. Interfaces enable us to write client programs that can manipulate objects of varying types, by invoking common methods from the interface.

Implementation inheritance (subclassing).

Java also supports another inheritance mechanism known as subclassing. The idea is to define a new class (subclass, or derived class) that inherits instance variables (state) and instance methods (behavior) from another class (superclass, or base class), enabling code reuse. Typically, the subclass redefines or overrides some of the methods in the superclass. For example, Java provides an elaborate inheritance hierarchy for GUI components:

Java inheritance hierarchy for GUI elements
In this book, we avoid subclassing because it works against encapsulation and immutability (e.g., the fragile base class problem and the circle–ellipse problem).

Application: data mining.

We consider a data mining application in which the goal is to associate with each document a vector known as a sketch so that so that documents that are different have sketches that are different and documents that are similar have sketches that are similar. Our API abstracts away this notion into the method similarTo(), which is a real number between 0 (not similar) and 1 (similar). The parameters k and d control the quality of the sketch.

Sketch API

Design by contract.

We briefly discuss two Java language mechanisms that enable you to verify assumptions about your program while it is running—exceptions and assertions.

Exercises

  1. Give an implementation of minus() for Vector.java solely in terms of the other Vector methods, such as direction() and magnitude().

    Solution:

    public Vector minus(Vector that) {
        return this.plus(that.scale(-1.0));
    }
    
  2. Add a toString() method to Vector.java that returns the vector components, separated by commas, and enclosed in matching parentheses.

Creative Exercises

  1. Statistics. Develop a data type for maintaining statistics for a set of real numbers. Provide a method to add data points and methods that return the number of points, the mean, the standard deviation, and the variance.

    $$ \begin{eqnarray*} \bar x &=& \frac{1}{n} \sum_i x_i \\ s^2 &=& \frac{\sum_i (x_i - \mu)^2}{n-1} \;\; = \;\; \frac{n \sum_i x_i^2 - (\sum_i x_i)^2}{n(n-1)} \end{eqnarray*} $$

    Develop two implementations: OnePass.java whose instance values are the number of points and the sum of the values, and the sum of the squares of the values, TwoPass.java that keeps an array containing all the points. For simplicity, you may take the maximum number of points in the constructor. Your first implementation is likely to be faster and use substantially less space, but is also likely to be susceptible to roundoff error.

    Solution: StableOnePass.java is a well-engineered alternative that is is numerically stable and does not require an array to store the elements.

    $$ \begin{eqnarray*} m_0 &=& 0 \\ s_0 &=& 0 \\ m_n &=& m_{n-1} + \frac{1}{n} \; (x_n - m_{n-1}) \\ s_n &=& s_{n-1} + \frac{n-1}{n} \; (x_n - m_{n-1})^2 \\ \bar x &=& m_n \\ s^2 &=& \frac{1}{n-1} s_n \end{eqnarray*} $$
  2. Genome. Develop a data type to store the genome of an organism. Biologists often abstract away the genome to a sequence of nucleotides (A, C, G, or T). The data type should support the method addNucleotide(), nucleotideAt(), as well as isPotentialGene(). Develop three implementations.
    • Use one instance variable of type String, implementing addCodon() with string concatenation. Each method call takes time proportional to the length of the current genome.
    • Use an array of characters, doubling the length of the array each time it fills up.
    • Use a boolean array, using two bits to encode each codon, and doubling the length of the array each time it fills up.

    Solution: StringGenome.java, Genome.java, and CompactGenome.java.

  3. Encapsulation. Is the following class immutable?
    import java.util.Date
    
    public class Appointment {
        private Date date;
        private String contact;
    
        public Appointment(Date date) {
            this.date = date;
            this.contact = contact;
        }
    
        public Date getDate() {
            return date;
        }
    
    Solution: No, because Java's java.util.Date is mutable. To correct, make a defensive copy of the date in the constructor and make a defensive copy of the date before returning to the client.
  4. Date. Design an implementation of Java's java.util.Date API that is immutable and therefore corrects the defects of the previous exercise.

    Partial solution: Date.java.

Web Exercises

  1. Add methods to Genome.java to test for equality and return the reverse-complemented genome.
  2. Add methods to Date.java to check which season (Spring, Summer, Fall, Winter) or astrological sign (Pisces, Libra, ...) a given date lies. Be careful about events that span December to January.
  3. Add a method daysUntil() to Date.java that takes a Date as an argument and returns the number of days between the two dates.
  4. Create an implementation Date2.java that represents a date a single integer that counts the number of days since January 1, 1970. Compare to Date.java.
  5. Create a Rectangle ADT that represents a rectangle. Represent a rectangle by two points. Include a constructor, a toString method, a method for computing the area, and a method for drawing using our graphics library.
  6. Repeat the previous exercise, but this time represent a Rectangle as the lower left endpoint and the width and height.
  7. Repeat the previous exercise, but this time represent a Rectangle as the center and the width and height.
  8. Sparse vector. Create a data type for sparse vectors. Represent a sparse vector by an array of indices (of nonzeros) and a parallel array of the corresponding nonzero values. Assume the indices are in ascending order. Implement the dot product operation.
  9. Copy constructor. Only needed if data type is mutable. Otherwise, assignment statement works as desired.
    public Counter(Counter x) {
        count = x.count;
    }
    
    Counter counter1 = new Counter();
    counter1.hit();
    counter1.hit();
    counter1.hit();
    Counter counter2 = new Counter(counter1);
    counter2.hit();
    counter2.hit();
    StdOut.println(counter1.get() + " " + counter2.get());    // 3 5
    
  10. Define an interface DifferentiableFunction.java for twice-differentiable function. Write a class Sqrt.java that implements the function f(x) = c - x^2.
  11. Write a program Newton.java that implements Newton's method to find a real root of a sufficiently smooth function, given that you start sufficiently close to a root. When method converges, it does so quadratically. Assume that it takes a DifferentiableFunction as argument.
  12. Generating random numbers. Different methods to generate a random number from the standard Gaussian distribution. Here, encapsulation enables us to replace one version with another that is more accurate or efficient. Trigonometric method is simple, but may be slow due to calling several transcendental functions. More importantly, it suffers from numerical stability problems when x1 is close to 0. Better method is alternate form of Box-Muller method. reference. Both methods require two values from a uniform distribution and produce two values from the Gaussian distribution with mean 0 and standard deviation 1. Can save work by remembering the second value for the next call. (This is how it is implemented in java.util.Random.) Their implementation is the polar method of Box-Muller, saving the second random number for a subsequent call. (See Knuth, ACP, Section 3.4.1 Algorithm C.)
  13. LAX airport shutdown. On September 14, 2004 Los Angeles airport was shut down due to software breakdown of a radio system used by air traffic controllers to communicate with pilots. The program used a Windows API function call GetTickCount() which returns the number of milliseconds since the system was last rebooted. The value is returned as a 32 bit integer, so after approximately 49.7 days it "wraps around." The software developers were aware of the bug, and instituted a policy that a technician would reboot the machine every month so that it would never exceed 31 days of uptime. Oops. LA Times blamed the technician, but the developers are more to blame for shoddy design.
  14. Polar representation of points. Re-implement the Point.java data type using polar coordinates.

    Solution: PointPolar.java.

  15. Spherical coordinates. Represent a point in 3D space using Cartesian coordinates \((x, y, z)\) or spherical coordinates \((r, \theta, \phi)\). To convert from one to the other, use

    $$ \begin{array}{lllllll} r &=& \sqrt{x^2 + y^2 + z^2} &\hspace{.3in} & x &=& r \cos \theta \sin \phi \\ \theta &=& \tan^{-1}(y/x) & & y &=& r \sin \theta \sin \phi \\ \phi &=& \cos^{-1}(z/r) & & z &=& r \cos \phi \\ \end{array} $$
  16. Colors. Can represent in RGB, CMYK, or HSV formats. Natural to have different implementations of same interface.
  17. ZIP codes. Implement an ADT that represents a USPS ZIP code. Support both the original 5 digit format and the newer (but optional) ZIP+4 format.