3.3   Designing Data Types


This section under major construction.


In this section we discuss encapsulation, immutability, and inheritance, with particular attention to the use of these mechanisms in data-type design to enable modular programming, facilitate debugging, and write clear and correct code.

Encapsulation.

The process of separating clients from implementations by hiding information is known as encapsulation. We use encapsulation for three primary reasons to enable modular programming, facilitate debugging, and clarify program code.

Immutability.

Immutable and mutable data types An object from a data type is immutable if its data-type value cannot change once created. An immutable data type is one in which all objects of that type are immutable. In contrast, a mutable data type is one in which objects of that type have values that are designed to change.

Spatial vectors.

Vector To illustrate these ideas in the context of a useful mathematical abstraction, we now consider a vector data type. A spatial vector is an abstract entity that has a magnitude and a direction. The concept extends to any number of dimensions: a sequence of n real numbers suffices to specify a vector in n-dimensional space. We use a boldface letter like \( \boldsymbol{x} \) to denote the vector \( ( x_0, x_1, \; \ldots, \; x_{n-1}) \).

Interface inheritance (subtyping).

Java provides the interface construct for declaring a relationship between otherwise unrelated classes, by specifying a common set of methods that each implementing class must include. Interfaces enable us to write client programs that can manipulate objects of varying types, by invoking common methods from the interface.

Implementation inheritance (subclassing).

Java also supports another inheritance mechanism known as subclassing. The idea is to define a new class (subclass, or derived class) that inherits instance variables (state) and instance methods (behavior) from another class (superclass, or base class), enabling code reuse. Typically, the subclass redefines or overrides some of the methods in the superclass.

Java inheritance hierarchy for GUI elements
In this book, we avoid subclassing because it works against encapsulation and immutability (e.g., the The fragile base class problem).

Application: data mining.

Design by contract.

We briefly discuss two Java language mechanisms that enable you to verify assumptions about your program while it is running—exceptions and assertions.

Similarity search.

Reference: Gauging Similarity via N-Grams. CompareAll.java performs similarity search. It relies on the data type Document.java that uses hashCode().

Subclassing.

"Code reuse is extremely important but difficult to achieve." -Joshua Bloch. Subtyping is a way to describe family of types when one subclass (child) "inherits" instance variables and instance methods from a superclass (parent). Define a new type that inherits properties from existing type. Base class specifies similarities; derived class specifies differences. It is a powerful technique that enables the programmer to change the behavior of a class and add functionality without rewriting the entire class from scratch. For example, we might have a superclass called Mammal, and subclasses Dog and Cat. We might also have further subclasses of Dog, say GermanShepherd and LabradorRetriever. (Not a compelling example since it's not organized by data similarities (e.g., have hair, have mammary glands, are warm-blooded) instead of similar behavior (e.g., provide companionship, give milk, pull a vehicle).

Note: subclass contains more methods than superclass. Subtyping is widely used for building extensible libraries, including GUI. In Java, you can only inherit from one superclass. Single inheritance vs. multiple inheritance.

Method polymorphism.

Polymorphism = many forms, e.g., water can be a liquid, solid (ice), or gas (steam). dynamic binding Method to be called is determined at run-time, and depends on the type of the object.

We override a method to provide specialized behavior. Overriding a method means to redefine a method in a subclass that was defined in a superclass. When you do this, the object in the subclass has two methods with the same name and signature.

Give simple example.

When to use inheritance.

Object.

In Java, all classes inherit from the universal superclass named Object.

Problems with subtyping.

Improper use of subtyping can create lots of programming nightmares since it often violates encapsulation and makes future changes more troublesome. Subtyping is sometimes derided as the "goto statement of the 1990s." Coupling = reliance of one part of a program on another. Prime tenet of OOP is to reduce the level of coupling.

Exercises

  1. Give an implementation of minus() for Vector.java solely in terms of the other Vector methods, such as direction() and magnitude().

    Solution:

    public Vector minus(Vector that) {
        return this.plus(that.scale(-1.0));
    }
    
  2. Add a toString() method to Vector.java that returns the vector components, separated by commas, and enclosed in matching parentheses.

Creative Exercises

  1. Statistics. Develop a data type for maintaining statistics for a set of real numbers. Provide a method to add data points and methods that return the number of points, the mean, the standard deviation, and the variance.

    $$ \begin{eqnarray*} \bar x &=& \frac{1}{n} \sum_i x_i \\ s^2 &=& \frac{\sum_i (x_i - \mu)^2}{n-1} \;\; = \;\; \frac{n \sum_i x_i^2 - (\sum_i x_i)^2}{n(n-1)} \end{eqnarray*} $$

    Develop two implementations: OnePass.java whose instance values are the number of points and the sum of the values, and the sum of the squares of the values, TwoPass.java that keeps an array containing all the points. For simplicity, you may take the maximum number of points in the constructor. Your first implementation is likely to be faster and use substantially less space, but is also likely to be susceptible to roundoff error.

    Solution: StableOnePass.java is a well-engineered alternative that is is numerically stable and does not require an array to store the elements.

    $$ \begin{eqnarray*} m_0 &=& 0 \\ s_0 &=& 0 \\ m_n &=& m_{n-1} + \frac{1}{n} \; (x_n - m_{n-1}) \\ s_n &=& s_{n-1} + \frac{n-1}{n} \; (x_n - m_{n-1})^2 \\ \bar x &=& m_n \\ s^2 &=& \frac{1}{n-1} s_n \end{eqnarray*} $$
  2. Genome. Develop a data type to store the genome of an organism. Biologists often abstract away the genome to a sequence of nucleotides (A, C, G, or T). The data type should support the method addNucleotide(), nucleotideAt(), as well as isPotentialGene(). Develop three implementations.
    • Use one instance variable of type String, implementing addCodon() with string concatenation. Each method call takes time proportional to the length of the current genome.
    • Use an array of characters, doubling the length of the array each time it fills up.
    • Use a boolean array, using two bits to encode each codon, and doubling the length of the array each time it fills up.

    Solution: StringGenome.java, Genome.java, and CompactGenome.java.

  3. Encapsulation. Is the following class immutable?
    import java.util.Date
    
    public class Appointment {
        private Date date;
        private String contact;
    
        public Appointment(Date date) {
            this.date = date;
            this.contat = contact;
        }
    
        public Date getDate() {
            return date;
        }
    
    Solution: No, because Java's java.util.Date is mutable. To correct, make a defensive copy of the date in the constructor and make a defensive copy of the date before returning to the client.
  4. Date. Design an implementation of Java's java.util.Date API that is immutable and therefore corrects the defects of the previous exercise.

    Partial solution: Date.java.

Web Exercises

  1. Add methods to Genome.java to test for equalty and return the reverse-complemented genome.
  2. Add methods to Date.java to check which season (Spring, Summer, Fall, Winter) or astrological sign (Pisces, Libra, ...) a given date lies. Be careful about events that span December to January.
  3. Add a method daysUntil() to Date.java that takes a Date as an argument and returns the number of days between the two dates.
  4. Create an implementation Date2.java that represents a date a single integer that counts the number of days since January 1, 1970. Compare to Date.java.
  5. Create a Rectangle ADT that represents a rectangle. Represent a rectangle by two points. Include a constructor, a toString method, a method for computing the area, and a method for drawing using our graphics library.
  6. Repeat the previous exercise, but this time represent a Rectangle as the lower left endpoint and the width and height.
  7. Repeat the previous exercise, but this time represent a Rectangle as the center and the width and height.
  8. Sparse vector. Create a data type for sparse vectors. Represent a sparse vector by an array of indices (of nonzeros) and a parallel array of the corresponding nonzero values. Assume the indices are in ascending order. Implement the dot product operation.
  9. Copy constructor. Only needed if data type is mutable. Otherwise, assignment statement works as desired.
    public Counter(Counter x) {
        count = x.count;
    }
    
    Counter counter1 = new Counter();
    counter1.hit();
    counter1.hit();
    counter1.hit();
    Counter counter2 = new Counter(counter1);
    counter2.hit();
    counter2.hit();
    StdOut.println(counter1.get() + " " + counter2.get());    // 3 5
    
  10. Define an interface DifferentiableFunction.java for twice-differentiable function. Write a class Sqrt.java that implements the function f(x) = c - x^2.
  11. Write a program Newton.java that implements Newton's method to find a real root of a sufficiently smooth function, given that you start sufficiently close to a root. When method converges, it does so quadratically. Assume that it takes a DifferentiableFunction as argument.
  12. Generating random numbers. Different methods to generate a random number from the standard Gaussian distribution. Here, encapsulation enables us to replace one version with another that is more accurate or efficient. Trigonometric method is simple, but may be slow due to calling several transcendental functions. More importantly, it suffers from numerical stability problems when x1 is close to 0. Better method is alternate form of Box-Muller method. reference. Both methods require two values from a uniform distribution and produce two values from the Gaussian distribution with mean 0 and standard deviation 1. Can save work by remembering the second value for the next call. (This is how it is implemented in java.util.Random.) Their implementation is the polar method of Box-Muller, saving the second random number for a subsequent call. (See Knuth, ACP, Section 3.4.1 Algorithm C.)
  13. LAX airport shutdown. On September 14, 2004 Los Angeles airport was shut down due to software breakdown of a radio system used by air traffic controllers to communicate with pilots. The program used a Windows API function call GetTickCount() which returns the number of milliseconds since the system was last rebooted. The value is returned as a 32 bit integer, so after approximately 49.7 days it "wraps around." The software developers were aware of the bug, and instituted a policy that a technician would reboot the machine every month so that it would never exceed 31 days of uptime. Oops. LA Times blamed the technician, but the developers are more to blame for shoddy design.
  14. Polar representation of points. Re-implement the Point.java data type using polar coordinates.

    Solution: PointPolar.java.

  15. Spherical coordinates. Represent a point in 3D space using Cartesian coordinates \((x, y, z)\) or spherical coordinates \((r, \theta, \phi)\). To convert from one to the other, use

    $$ \begin{array}{lllllll} r &=& \sqrt{x^2 + y^2 + z^2} &\hspace{.3in} & x &=& r \cos \theta \sin \phi \\ \theta &=& \tan^{-1}(y/x) & & y &=& r \sin \theta \sin \phi \\ \phi &=& \cos^{-1}(z/r) & & z &=& r \cos \phi \\ \end{array} $$
  16. Colors. Can represent in RGB, CMYK, or HSV formats. Natural to have different implementations of same interface.
  17. ZIP codes. Implement an ADT that represents a USPS ZIP code. Support both the original 5 digit format and the newer (but optional) ZIP+4 format.