# 3.3   Designing Data Types

This section under major construction.

In this section we discuss encapsulation, immutability, and inheritance, with particular attention to the use of these mechanisms in data-type design to enable modular programming, facilitate debugging, and write clear and correct code.

## Encapsulation.

The process of separating clients from implementations by hiding information is known as encapsulation. We use encapsulation for three primary reasons to enable modular programming, facilitate debugging, and clarify program code.
• Complex numbers revisited. As an example, recall Complex.java, which represents a complex number using Cartesian coordinates as $$x + iy$$. PolarComplex.java has essentially the same API, except that it represents a complex number using polar coordinates as $$r (\cos \theta + i \sin \theta)$$. The idea of encapsulation is that we can substitute one of these programs for the other without changing client code.
• Private. Java’s language support for enforcing encapsulation is the private access modifier. When you declare an instance variable (or method) to be private, you are making it impossible for any client (code in another class) to directly access that instance variable (or method).
• Limiting the potential for error. Encapsulation also helps programmers ensure that their code operates as intended. To understand the problem, consider Counter.java, which implements a simple counter according to the following API:

It encapsulates a single integer and ensures that the only operation that can be performed on the integer is increment by 1. Without the private modifier, a client could write code like the following:
 Counter counter = new Counter("Volusia"); counter.count = -16022; 
With the private modifier, code like this will not compile.

## Immutability.

An object from a data type is immutable if its data-type value cannot change once created. An immutable data type is one in which all objects of that type are immutable. In contrast, a mutable data type is one in which objects of that type have values that are designed to change.
• Advantages of immutability. One very important reason to implement immutable types is that we can use immutable objects in assignment statements (or as arguments and return values from methods) without having to worry about their values changing. In this way, immutable types behave the same as primitive types. For this reason, immutable types are easier to use and harder to misuse.
• Cost of immutability. The main downside of immutability is that a new object must be created for every value.
• Final. You can use the final modifier to help enforce immutability in a data type. When you declare an instance variable as final, you are promising to assign it a value only once, either in an inline initialization statement or in the constructor.
• Reference types. Unfortunately, final guarantees immutability only when instance variables are primitive types, not reference types. If an instance variable of a reference type has the final modifier, the value of that instance variable (the object reference) will never change—it will always refer to the same object. However, the value of the object itself can change. To ensure immutability in such cases, we need to make a defensive copy.

## Spatial vectors.

To illustrate these ideas in the context of a useful mathematical abstraction, we now consider a vector data type. A spatial vector is an abstract entity that has a magnitude and a direction. The concept extends to any number of dimensions: a sequence of n real numbers suffices to specify a vector in n-dimensional space. We use a boldface letter like $$\boldsymbol{x}$$ to denote the vector $$( x_0, x_1, \; \ldots, \; x_{n-1})$$.
• API. The basic operations on vectors are to add two vectors, multiply a vector by a scalar, compute the dot product of two vectors, and to compute the magnitude and direction, as follows:
• Addition: $$\boldsymbol{x} + \boldsymbol{y} = ( x_0 + y_0, x_1 + y_1, \; \ldots, \; x_{n-1} + y_{n-1})$$
• Vector scaling: $$\alpha \boldsymbol{x} = (\alpha x_0, \alpha x_1, \; \ldots, \; \alpha x_{n-1})$$
• Dot product $$\boldsymbol{x} \cdot \boldsymbol{y} = x_0y_0 + x_1y_1 + \, \ldots \, + x_{n-1}y_{n-1}$$
• Magnitude: $$\left | \boldsymbol{x} \right | = \sqrt{x_0^2 + x_1^2 + \, \ldots \, + x_{n-1}^2}$$
• Direction: $$\boldsymbol{x} \,/\, \left | \boldsymbol{x} \right | = (x_0 \,/\, \left | \boldsymbol{x} \right |, x_1 \,/\, \left | \boldsymbol{x} \right |, \; \ldots, \; x_{n-1} \,/\, \left | \boldsymbol{x} \right |)$$

These basic mathematical definitions lead immediately to an API:

• Implementation. Vector.java is an immutable data type that implements this API. Internally, it uses an array to store the Cartesian coordiantes, be careful to make a defensive copy of the array provided in the constructor.
• The this reference. Within an instance method (or constructor), the this keyword gives us a way to refer to the object whose instance method (or constructor) is being called. You can use this in the same way you use any other object reference. For example, the magnitude() method in Vector uses the this keyword in two ways: to invoke the dot() method and as an argument to the dot() method.
 // return the magnitude of this Vector public double magnitude() { return Math.sqrt(this.dot(this)); } 

## Interface inheritance (subtyping).

Java provides the interface construct for declaring a relationship between otherwise unrelated classes, by specifying a common set of methods that each implementing class must include. Interfaces enable us to write client programs that can manipulate objects of varying types, by invoking common methods from the interface.
• Defining an interface. As a motivating example, we define an interface Function.java for real-valued functions of a single variable.
 public interface Function { public abstract double evaluate (double x); } 
The body of the interface contains a list of abstract methods. An abstract method is a method that is declared but does not include any implementation code; it contains only the method signature. The modifier abstract designates a method as abstract. As with a Java class, you must save a Java interface in a file whose name matches the name of the interface, with a .java extension.
• Implementing an inteface. To write a class that implements an interface, you must do two things.
• Include an implements clause in the class declaration with the name of the interface.
• Implement each of the abstract methods in the interface.

For example, Square.java and GaussianPDF.java implements the Function interface.

• Using an inteface. An interface is a reference type. For example, you can declare the type of a variable to be the name of an interface. When you do so, any object you assign to that variable must be an instance of a class that implements the interface. For example, a variable of type Function may store an object of type Square or GaussianPDF.
 Function f1 = new Square(); Function f2 = new GaussianPDF(); Function f3 = new Complex(1.0, 2.0); // compile-time error 
When a variable of an interface type invokes a method declared in the interface, Java knows which method to call because it knows the type of the invoking object. This powerful programming mechanism is known as polymorphism or dynamic dispatch.
• Plotting functions. FunctionGraph.java plots the graph of a function f in the interval [a, b] by sampling the function at n + 1 evenly spaced points. It works for any sufficiently smooth function f that implements the Function interface, including Square or GaussianPDF.

• Numerical integration. RectangleRule.java estimates the integral of a positive real-valued function f in an interval (a, b) using the rectangle rule.

It works for any sufficiently smooth function f that implements the Function interface.
• Lambda expressions. To simplify syntax, Java provides a powerful functional programming feature known as lambda expressions. You should think of a lambda expression as a block of code that you can pass around and execute later. In its simplest form, a lambda expression consists of the three elements:
• A list of parameters variables, separated by commas, and enclosed in parentheses
• The lambda operator ->
• A single expression, which is the value returned by the lambda expression

For example, the following lambda expression implements the hypotenuse function:

Our primary use of lambda expressions is as a concise way to implement a functional interface (an interface with a single abstract method). Specifically, you can use a lambda expression wherever an object from a functional interface is expected. For example, all of the following expressions implement the Function.java interface:

Consequently, you can can integrate the square function with the call integrate(x -> x*x, 0, 10, 1000), thereby bypassing the need to define a separate Square class.
• Built-in interfaces. Java includes three interfaces that we will consider later this book.
• In Section 4.2, we will consider Java’s java.util.Comparable interface to define an order in which to compare objects of the same type, such as alphabetical order for strings or ascending order for integers.
• In Section 4.3, we will use Java’s java.util.Iterator and java.lang.Iterable interfaces to enable clients to iterate over the items in a collection, without relying on the underlying representation.

## Implementation inheritance (subclassing).

Java also supports another inheritance mechanism known as subclassing. The idea is to define a new class (subclass, or derived class) that inherits instance variables (state) and instance methods (behavior) from another class (superclass, or base class), enabling code reuse. Typically, the subclass redefines or overrides some of the methods in the superclass.

In this book, we avoid subclassing because it works against encapsulation and immutability (e.g., the The fragile base class problem).
• Java's Object superclass. Certain vestiges of subclassing are built into Java and therefore unavoidable. Specifically, every class is a subclass of java.lang.Object. When programming in Java, you will often override one or more of these inherited methods:

• String conversion. Every Java class inherits the toString() method, so any client can invoke toString() for any object. This convention is the basis for Java’s automatic conversion of one operand of the string concatenation operator + to a string whenever the other operand is a string.
• Equality.
• Hashing.
• Wrapper types. The toString(), hashCode(), and equals() methods apply only to reference types, not primitive types. For example, the expression x.hashCode() does not work if x is a variable of a primitive type. For situations where we wish want to represent a value from a primitive type as an object, Java supplies built-in reference types known as wrapper types, one for each of the eight primitive types.
• Autoboxing and unboxing. Java automatically converts between an object from a wrapper type and the corresponding primitive data-type value so that you can write code like the following:
 Integer x = 17; // Autoboxing (int -> Integer) int a = x; // Unboxing (Integer -> int) 

## Application: data mining.

• Computing sketches.
• Hashing.
• Comparing sketches.
• Comparing all pairs.
• Searching for similar documents.

## Design by contract.

We briefly discuss two Java language mechanisms that enable you to verify assumptions about your program while it is running—exceptions and assertions.
• Exceptions. An exception is a disruptive event that occurs while a program is running, often to signal an error. Action is known as throwing an exception. Java includes an elaborate inheritance hierarchy of predefined exceptions, several of which we have already encountered.

It is good practice to use exceptions when they can be helpful to the user. For example, in Vector.java, we should throw an exception in plus() if the two Vectors to be added have different dimensions:
 if (this.length() != that.length()) throw new IllegalArgumentException("Dimensions disagree."); 
• Assertions. An assertion is a boolean expression that you are affirming is true at some point during the execution of a program. If the expression is false, the program will throw an AssertionError, which typically terminates the program and reports an error message. For example, in Counter.java, we might check that the counter is never negative by adding the following assertion as the last statement in increment():
 assert count >= 0 : "Negative count detected in increment()"; 

By default, assertions are disabled, but you can enable them from the command line by using the -enableassertions flag (-ea for short). Assertions are for debugging only; your program should not rely on assertions for normal operation since they may be disabled.

In the design-by-contract model of programming, the designer expresses conditions about the behavior of the program using assertions.

• Precondition. A condition that the client promises to satisfy when calling a method.
• Postcondition. A condition that the implementation promises to achieve when returning from a method.
• Invariant. A condition that the implementation promises to satisfy while the method is executing.
• Side effects. Any other change in state that the method would cause.

## Similarity search.

Reference: Gauging Similarity via N-Grams. CompareAll.java performs similarity search. It relies on the data type Document.java that uses hashCode().

## Subclassing.

"Code reuse is extremely important but difficult to achieve." -Joshua Bloch. Subtyping is a way to describe family of types when one subclass (child) "inherits" instance variables and instance methods from a superclass (parent). Define a new type that inherits properties from existing type. Base class specifies similarities; derived class specifies differences. It is a powerful technique that enables the programmer to change the behavior of a class and add functionality without rewriting the entire class from scratch. For example, we might have a superclass called Mammal, and subclasses Dog and Cat. We might also have further subclasses of Dog, say GermanShepherd and LabradorRetriever. (Not a compelling example since it's not organized by data similarities (e.g., have hair, have mammary glands, are warm-blooded) instead of similar behavior (e.g., provide companionship, give milk, pull a vehicle).

Note: subclass contains more methods than superclass. Subtyping is widely used for building extensible libraries, including GUI. In Java, you can only inherit from one superclass. Single inheritance vs. multiple inheritance.

## Method polymorphism.

Polymorphism = many forms, e.g., water can be a liquid, solid (ice), or gas (steam). dynamic binding Method to be called is determined at run-time, and depends on the type of the object.

We override a method to provide specialized behavior. Overriding a method means to redefine a method in a subclass that was defined in a superclass. When you do this, the object in the subclass has two methods with the same name and signature.

Give simple example.

• Swing GUI hierarchy. Button -> AbstractButton -> JComponent -> Container -> Component -> Object.
• Java exceptions. Exception -> RuntimeException -> NullPointerException
• Supersymmetry particle classification. Physicists have classified particles into the following taxonomy. For example, an electron is a type of lepton (feels weak force but not strong force), which is a type of fermion (particles that make up matter).
When to use inheritance.
• Liskov Substitution Principle. If A inherits from B, then whenever you expect to use a B object, you can safely use an object of type A instead and you will get the desired behavior. That is, objects in the subclass must behave in a manner consistent with promises made in the API of the superclass. Instead of is-a-kind-of relationship, it's better to think is-a-substitute-for. When you use extends Java checks that the two classes have compatible method signatures, but does not check that the behavior is consistent.
• Open-closed principle. Software should be designed to be open for extension (can extend the module to behave in new ways), but closed for modification (no one should ever change the source code). Ex: square, circle, triangle. Don't want a switch statement in base class for each type.

## Object.

In Java, all classes inherit from the universal superclass named Object.
• equals(). Implementing equals(). Nearly impossible to do correctly a type and a subtype.
• toString(). The toString() method is implemented in Object, and hence common to all classes. This method returns a String representation of the object. The default behavior on many systems is to return the memory address where the object is stored. The principle of inheritance is what enables System.out.println() and StdOut.println() to "know" how to print an object when you supply a method called toString(). When you call System.out.print() with an object as an argument, the system automatically invokes the object's toString method. Which toString() method does the system use? It uses the one in the subclass, if it is defined, otherwise it uses the one from the superclass.

## Problems with subtyping.

Improper use of subtyping can create lots of programming nightmares since it often violates encapsulation and makes future changes more troublesome. Subtyping is sometimes derided as the "goto statement of the 1990s." Coupling = reliance of one part of a program on another. Prime tenet of OOP is to reduce the level of coupling.
• Circle-ellipse problem. The key to understanding when to use inheritance is the circle-ellipse problem. Circle extends Ellipse is OK if both are immutable types, but not if Ellipse can be squashed asymmetrically (e.g., has a method to change its width and height). A properly designed subclass must have only less strict preconditions and postconditions. When Circle inherits from Ellipse, there are additional preconditions (the width and height are equal). From a geometric viewpoint, a circle is a kind of ellipse, but it does not have the same behavior in this example (because you can mutate the width and height parameters). The is-a-kind-of rule is necessary, but not sufficient. Intuition with when to use inheritance is often wrong.

Another wrong approach might be to make Ellipse extend Circle. Bad idea: since Circle may have a method radius() that makes no sense for an asymmetric Ellipse. This problem is compounded if Circle has an implementation for circumference() that is 2 * Math.PI * radius().

• Simple example: adding a method to the superclass with different return type than an existing method in a subclass (won't compile). Violates the "is a" rule: Sun's implementation of Stack which inherits from Vector and Sun's implementation of Properties which inherits from HashTable. Subtle example: use inheritance to create a subclass of Stack that counts the total number of elements inserted by maintaining an extra field, overriding add and addAll. Problem: superclass implements addAll by invoking add so we get twice the count when using addAll. We could then rely on addAll calling add, but now we are relying on the superclass not changing its implementation in the future. (Use composition and forwarding instead.) Reference

• Subtyping breaks immutability. Immutability is a desirable feature of an ADT. Cannot allow an immutable class to be extended; otherwise subclass could break guarantees made by parent. In Java, making a class final means that it is a compile-time error to extend it. Also, you should make instance variables final even if they are already private. This ensures that they cannot be changed, even using reflection.
• Can't implement equals() correctly.

#### Exercises

1. Give an implementation of minus() for Vector.java solely in terms of the other Vector methods, such as direction() and magnitude().

Solution:

 public Vector minus(Vector that) { return this.plus(that.scale(-1.0)); } 
2. Add a toString() method to Vector.java that returns the vector components, separated by commas, and enclosed in matching parentheses.

#### Creative Exercises

1. Statistics. Develop a data type for maintaining statistics for a set of real numbers. Provide a method to add data points and methods that return the number of points, the mean, the standard deviation, and the variance.

$$\begin{eqnarray*} \bar x &=& \frac{1}{n} \sum_i x_i \\ s^2 &=& \frac{\sum_i (x_i - \mu)^2}{n-1} \;\; = \;\; \frac{n \sum_i x_i^2 - (\sum_i x_i)^2}{n(n-1)} \end{eqnarray*}$$

Develop two implementations: OnePass.java whose instance values are the number of points and the sum of the values, and the sum of the squares of the values, TwoPass.java that keeps an array containing all the points. For simplicity, you may take the maximum number of points in the constructor. Your first implementation is likely to be faster and use substantially less space, but is also likely to be susceptible to roundoff error.

Solution: StableOnePass.java is a well-engineered alternative that is is numerically stable and does not require an array to store the elements.

$$\begin{eqnarray*} m_0 &=& 0 \\ s_0 &=& 0 \\ m_n &=& m_{n-1} + \frac{1}{n} \; (x_n - m_{n-1}) \\ s_n &=& s_{n-1} + \frac{n-1}{n} \; (x_n - m_{n-1})^2 \\ \bar x &=& m_n \\ s^2 &=& \frac{1}{n-1} s_n \end{eqnarray*}$$
2. Genome. Develop a data type to store the genome of an organism. Biologists often abstract away the genome to a sequence of nucleotides (A, C, G, or T). The data type should support the method addNucleotide(), nucleotideAt(), as well as isPotentialGene(). Develop three implementations.
• Use one instance variable of type String, implementing addCodon() with string concatenation. Each method call takes time proportional to the length of the current genome.
• Use an array of characters, doubling the length of the array each time it fills up.
• Use a boolean array, using two bits to encode each codon, and doubling the length of the array each time it fills up.

Solution: StringGenome.java, Genome.java, and CompactGenome.java.

3. Encapsulation. Is the following class immutable?
 import java.util.Date public class Appointment { private Date date; private String contact; public Appointment(Date date) { this.date = date; this.contat = contact; } public Date getDate() { return date; } 
Solution: No, because Java's java.util.Date is mutable. To correct, make a defensive copy of the date in the constructor and make a defensive copy of the date before returning to the client.
4. Date. Design an implementation of Java's java.util.Date API that is immutable and therefore corrects the defects of the previous exercise.

Partial solution: Date.java.

#### Web Exercises

1. Add methods to Genome.java to test for equalty and return the reverse-complemented genome.
2. Add methods to Date.java to check which season (Spring, Summer, Fall, Winter) or astrological sign (Pisces, Libra, ...) a given date lies. Be careful about events that span December to January.
3. Add a method daysUntil() to Date.java that takes a Date as an argument and returns the number of days between the two dates.
4. Create an implementation Date2.java that represents a date a single integer that counts the number of days since January 1, 1970. Compare to Date.java.
5. Create a Rectangle ADT that represents a rectangle. Represent a rectangle by two points. Include a constructor, a toString method, a method for computing the area, and a method for drawing using our graphics library.
6. Repeat the previous exercise, but this time represent a Rectangle as the lower left endpoint and the width and height.
7. Repeat the previous exercise, but this time represent a Rectangle as the center and the width and height.
8. Sparse vector. Create a data type for sparse vectors. Represent a sparse vector by an array of indices (of nonzeros) and a parallel array of the corresponding nonzero values. Assume the indices are in ascending order. Implement the dot product operation.
9. Copy constructor. Only needed if data type is mutable. Otherwise, assignment statement works as desired.
 public Counter(Counter x) { count = x.count; } Counter counter1 = new Counter(); counter1.hit(); counter1.hit(); counter1.hit(); Counter counter2 = new Counter(counter1); counter2.hit(); counter2.hit(); StdOut.println(counter1.get() + " " + counter2.get()); // 3 5 
10. Define an interface DifferentiableFunction.java for twice-differentiable function. Write a class Sqrt.java that implements the function f(x) = c - x^2.
11. Write a program Newton.java that implements Newton's method to find a real root of a sufficiently smooth function, given that you start sufficiently close to a root. When method converges, it does so quadratically. Assume that it takes a DifferentiableFunction as argument.
12. Generating random numbers. Different methods to generate a random number from the standard Gaussian distribution. Here, encapsulation enables us to replace one version with another that is more accurate or efficient. Trigonometric method is simple, but may be slow due to calling several transcendental functions. More importantly, it suffers from numerical stability problems when x1 is close to 0. Better method is alternate form of Box-Muller method. reference. Both methods require two values from a uniform distribution and produce two values from the Gaussian distribution with mean 0 and standard deviation 1. Can save work by remembering the second value for the next call. (This is how it is implemented in java.util.Random.) Their implementation is the polar method of Box-Muller, saving the second random number for a subsequent call. (See Knuth, ACP, Section 3.4.1 Algorithm C.)
13. LAX airport shutdown. On September 14, 2004 Los Angeles airport was shut down due to software breakdown of a radio system used by air traffic controllers to communicate with pilots. The program used a Windows API function call GetTickCount() which returns the number of milliseconds since the system was last rebooted. The value is returned as a 32 bit integer, so after approximately 49.7 days it "wraps around." The software developers were aware of the bug, and instituted a policy that a technician would reboot the machine every month so that it would never exceed 31 days of uptime. Oops. LA Times blamed the technician, but the developers are more to blame for shoddy design.
14. Polar representation of points. Re-implement the Point.java data type using polar coordinates.

Solution: PointPolar.java.

15. Spherical coordinates. Represent a point in 3D space using Cartesian coordinates $$(x, y, z)$$ or spherical coordinates $$(r, \theta, \phi)$$. To convert from one to the other, use

$$\begin{array}{lllllll} r &=& \sqrt{x^2 + y^2 + z^2} &\hspace{.3in} & x &=& r \cos \theta \sin \phi \\ \theta &=& \tan^{-1}(y/x) & & y &=& r \sin \theta \sin \phi \\ \phi &=& \cos^{-1}(z/r) & & z &=& r \cos \phi \\ \end{array}$$
16. Colors. Can represent in RGB, CMYK, or HSV formats. Natural to have different implementations of same interface.
17. ZIP codes. Implement an ADT that represents a USPS ZIP code. Support both the original 5 digit format and the newer (but optional) ZIP+4 format.