3.3 Designing Data Types
This section under major construction.
Designing data types
- Designing APIs
- Encapsulation (use private for information hiding)
- Immutability (use final to to help enforce)
- Design-by-contract (use assert to check pre-conditions and post-conditions)
Theme = limit what you, the programmer, can do. Emphasize encapsulation and immutability.
Designing APIs.
Often the most important and most challenging step in building software is designing the APIs. In many ways, designing good programs is more challenging that writing the code itself. Takes practice, careful deliberation, and many iterations.- Specification problem. Document the API in English. Clearly articulate behavior for all possible inputs, including side effects. "Write to specification." Difficult problem. Many bugs introduced because programmer didn't correctly understand description of API. See booksite for information on automatic documentation using Javadoc.
- Wide interfaces.
"API should do one thing and do it well."
"APIs should be as small as possible, but no smaller."
"When in doubt, leave it out." (It's easy to add methods to an existing API,
but you can never remove them without breaking existing clients.)
APIs with lots of bloat are known as wide interfaces.
Supply all necessary operations, but no more.
Try to make methods orthogonal in functionality.
No need for a method in Complex
that adds three complex numbers since there is a method that adds two.
The Math library includes methods for sin(), cos(),
and tan(), but not sec().
Java libraries tend to have wide interfaces (some designed by pros, some by committee). Sometimes this seems to be the right thing, e.g., String. Although, sometimes you end up with poorly designed APIs that you have to live with forever.
- Deprecated methods.
Sometimes you end up with deprecated methods that are no longer
fully supported, but you still need to keep
them or break backward compatibility.
Once Java included a method Character.isSpace(), programmers
wrote programs that relied on its behavior. Later, they wanted
to change the method to support additional Unicode whitespace characters.
Can't change the behavior of isSpace() or that would break
many programs.
Instead, add a new method Character.isWhiteSpace() and "deprecate"
the old method. The API is now more confusing than needed.
Almost all methods in java.util.Date are deprecated in favor of java.util.GregorianCalendar.
Backward compatibility. The need for backward compatibility shapes much of the way things are done today (from operating systems to programming languages to ...). [Insert a story.]
Standards. It is easy to understand why writing to an API is so important by considering other domains. Fax machines, radio, MPEG-4, MP3 files, PDF files, HTML, etc. Simpler to use a common standard. Lack of incompatibilities enables business opportunities that would otherwise be impossible. One of the challenges of writing software is making it portable so that it works on a variety of operating systems including Windows, OS X, and Linux. Java Virtual Machine enables portability of Java across platforms.
Encapsulation.
Designing reusable data types requires great care. The API is a contract between the client and the implementation that specifies the set of permissible operations and what is their required behavior. Encapsulation is the process of separating and compartmentalizing the client from the implementation by hiding how the data type is implemented. This enable you to change the "how" as the program evolves. "A nut and bolt are compatible if they have the same diameter and threads per inch." This is the interface. However, "they may be made of carbon steel, steel, bronze, nylon, titanium, whatever."Encapsulation enables us to:
- Keep instance variables in a consistent state, e.g., nonnegative integer.
- Do error-checking on instance variables to monitor when they change their value. This can help debugging since you can check when a value unexpectedly changes state since it can only happen by calling a function within that class.
- Reduce dependencies between different modules. This enables us to debug and test one module independently from others.
- Improve resiliency of software systems by limiting and localizing the effects of changing the internal representation of a data type as the program evolves.
- Substitute different implementations of the same data type to improve performance, accuracy, or memory footprint.
Some mottos.
- Abstraction and information hiding are nice for small programs and absolutely necessary for big ones. Brian Kernighan.
- Limit scope. The tighter the scope, the easier a program is to maintain.
- Principle of least privilege. Each part of the program should know as much as it needs, but no more.
- Ask don't touch. Client shouldn't directly manipulate the instance variables. Instead it should ask the data type to do the work.
Zip codes and time bombs.
There are many examples of programmers making assumptions about the internal representation of a data type, and later having to pay a big price to fix their mistakes.- ZIP codes. In 1963, The United States Postal Service (USPS) began using a 5 digit ZIP code (Zoning Improvement Plan) to improve the sorting and delivery of mail. Programmers wrote lots of software that assumed zip codes would remain 5 digits forever, and represented them in their programs using a single 32-bit integer. In 1983, the USPS introduced an expanded ZIP code called ZIP+4 which consists of the original 5 digit ZIP code plus 4 extra digits to assist in delivery. This broke hundreds of fragile programs and required millions of dollars to fix.
- Y2K. The much publicized Y2K bug resulted from many programmers encoding years using two digits (years since 1900) to save memory (back in the 1960s when saving two characters per date field was more significant). As the year 2000 approached, programmers had to scavenge through millions of lines of code to find all those places that relied on a two digit year. Some of these programs controlled nuclear power plants, financial markets, commercial air traffic, and military warships. The cost for fixing all of these problems was estimated to be around $300 billion.
- IPv4 vs. IPv6. The Internet protocol is a standard used by electronic devices to exchange data over the Internet. Each device is assigned a unique integer or address. IPv4 uses 32-bit addresses and supports about 4.3 billion addresses. IPv6 uses 128-bit addresses and supports 2^128 addresses (over 340 decillion!). This will increase network traffic since each network packet contains the source and destination address. The total cost of the transition in the US might reach $500 billlion!
- Vehicle identification numbers. In 1981, The Society of Automotive Engineers established a unique 17-character naming scheme for vehicles known as the Vehicle Identification Number (or VIN for short). The VIN describes the make, model, year, and other attributes of cars, trucks, buses, and other vehicles in the United States. Automakers expect to run out by 2010. One solution is to increase the length of the VIN. Another is to reuse existing VINs. Both would require major logistical changes to supporting software systems, estimates in the billions of dollars. Moral: whenever a unique number is needed, don't skimp!
Encapsulation in Java. Java provides language support for information hiding. When we declare an instance variable (or method) as private, this means that the client (code written in another module) cannot directly access that instance variable (or method). The client can only access the API through the public methods and constructors. Programmer can modify the implementation of private methods (or use different instance variables) with the comfort that no client will be directly affected.
Program Counter.java implements a counter, e.g., for an electronic voting machine. It encapsulates a single integer to ensure that it can only get incremented by one at at time and to ensure that it never goes negative. The goal of data abstraction is to restrict which operations you can perform. Can ensure that data type value always remains in a consistent state. Can add logging capability to hit(), e.g., to print timestamp of each vote. In the 2000 presidential election, Al Gore received negative 16,022 votes on an electronic voting machine in Volusia County, Florida. The counter variable was not properly encapsulated in the voting machine software!
non-encapsulated data type encapsulated data type -------------------------------------- -------------------------------------- public class Counter { public class Counter { public final String name; private final String name; public int count; private int count; public Counter(String name) { public Counter(String name) { count = 0; count = 0; this.name = name; this.name = name; } } public void hit() { count++; } public void hit() { count++; } public int get() { return count; } public int get() { return count; } } } client client -------------------------------------- --------------------------------------- Counter c = new Counter("Volusia"); Counter c = new Counter("Volusia"); c.count = -16022; // legal c.count = -16022; // compile-time error
- Access control. Java provides a mechanism for access control to prevent the use of some variable or method in one part of a program from direct access in another. We have been careful to define all of our instance variables with the private access modifier. This means that they cannot be directly accessed from another class, thereby encapsulating the data type. For this reason, we always use private as the access modifier for our instance variables and recommend that you do the same. If you use public then you will greatly limit any opportunity to modify the class over time. Client programs may rely on your public variable in thousands of places, and you will not be able to remove it without breaking dependent code.
- Getters and setters.
A data type should not have public instance variables.
You should obey this rule not just in letter, but also in spirit.
Novice programmers are often tempted to include get() and
set() methods for each instance variable, to read and write
its value.
The purpose of encapsulation is not just to hide the data, but to hide design decisions which are subject to change. In other words, the client should tell an object what to do, rather than asking an object about its state (get()), making a decision, and then telling it how to do it (set()). Usually it's better design to not have the get() and set() methods. When a get() method is warranted, try to avoid including a set() method.Complex a = new Complex(1.0, 2.0); Complex b = new Complex(3.0, 4.0); // violates spirit of encapsulation Complex c = new Complex(0.0, 0.0); c.setRe(a.re() + b.re()); c.setIm(a.im() + b.im()); // better design Complex a = new Complex(1.0, 2.0); Complex b = new Complex(3.0, 4.0); Complex c = a.plus(b);
Data representation changes in scientific applications.
Simple example: represent a point using Cartesian or polar coordinates. Polynomials (coefficents vs. point-value), matrices (sparse vs. dense).Immutability.
An immutable data type is a data type such that the value of an object never changes once constructed. Examples: Complex and String. When you pass a String to a method, you don't have to worry about that method changing the sequence of characters in the String. On the other hand, when you pass an array to a method, the method is free to change the elements of the array.Immutable data types have numerous advantages. they are easier to use, harder to misuse, easier to debug code that uses immutable types, easier to guarantee that the class variables remain in a consistent state (since they never change after construction), no need for copy constructor, are thread-safe, work well as keys in symbol table, don't need to be defensively copied when used as an instance variable in another class. Disadvantage: separate object for each value.
Josh Block, a Java API architect, advises that "Classes should be immutable unless there's a very good reason to make them mutable....If a class cannot be made immutable, you should still limit its mutability as much as possible."
Give example where function changes value of some Complex object, which leaves the invoking function with a variable whose value it cannot rely upon.
mutable immutable ------------------------------------ Counter Complex MovingCharge Charge Draw String array Vector java.util.Date primitive types Picture wrapper types
- Final.
Java provides language support to enforce immutability.
When you declare a variable to be final, you are promising to
assign it a value only once, either in an initializer or in the constructor.
It is a compile-time error to modify the value of a final variable.
It is good style to use the modifier final with instance variables whose values never change.public class Complex { private final double re; private final double im; public Complex(double real, double imag) { re = real; im = imag; } // compile-time error public void plus(Complex b) { re = this.re + b.re; // oops, overwrites invoking object's value im = this.im + b.re; // compile-time error since re and im are final return new Complex(re, im); } }- Serves as documentation that the value does not change.
- Prevents accidental changes.
- Makes programs easier to debug, since it's easier to keep track of the state: initialized at construction time and never changes.
- Mutable instance variables.
If the value of a final instance variable is mutable,
the value of that instance variable (the reference to an object) will never change -
it will always refer to the same object. However, the value of the object
itself can change.
For example, in Java, arrays are mutable objects: if you have an
final instance variable that is an array, you can't change the
array (e.g., to change its length), but you can change the individual
array elements.
This creates a potential mutable hole in an otherwise immutable data type. For example, the following implementation of a Vector is mutable.
A client program can create a Vector by specifying the entries in an array, and then change the elements of the Vector from (3, 4) to (0, 4) after construction (thereby bypassing the public API).public final class Vector { private final int N; private final double[] coords; public Vector(double[] a) { N = a.length; coords = a; } ... }double[] a = { 3.0, 4.0 }; Vector vector = new Vector(a); StdOut.println(vector.magnitude()); // 5.0 a[0] = 0.0; // bypassing the public API StdOut.println(vector.magnitude()); // 4.0 - Defensive copy.
To guarantee immutability of a data type that includes an instance variable
of a mutable type, we perform a defensive copy.
By creating a local copy of the array, we ensure that any change the
client makes to the original array has no effect on the object.
Program Vector.java encapsulates an immutable array.public final class Vector { private final int N; private final double[] coords; public Vector(double[] a) { N = a.length; // defensive copy coords = new double[N]; for (int i = 0; i < N; i++) { coords[i] = a[i]; } } ... } - Global constants.
The final modifier is also widely used to specify local or global constants.
For example, the following appears in Java's Math library.
If the variables were declared public and not final, a client could wreak havoc by re-assigning Math.PI = 1.0; Since Math.PI is declared to be final, such an attempt would be flagged as a compile-time error.public static final double E = 2.7182818284590452354; public static final double PI = 3.14159265358979323846;
Design-by-contract.
Language mechanism that enables you to verify assumptions about your program as it is running. For example, if you have a data type that represents a particle, you might assert that its mass is positive and its spped is less than the speed of light. Or if you have a method to add two vectors of the same length, you might assert that the length of the resulting vector also has the same length. Design-by-contract model of programming.- Precondition. A condition that the client promises to satisfy when calling a method.
- Postcondition. A condition that the implementation promises to achieve when returning from a method.
- Invariant. A condition that should always be true when a certain statement is executing.
- Side effects. Always document the side effects of a method, including I/O and exceptions.
- Exceptions.
An exception is a disruptive event that occurs while a program is running, often
to signal an error.
Action is known as throwing an exception.
We have seen several exceptions: StackOverflowException, DivideByZeroException,
NullPointerException and ArrayOutOfBoundsException.
You can also create your own.
The simplest kind is a RuntimeException that terminates execution of the
program and print out an error message.
throw new RuntimeException("Error message here"); - Assertions.
An
assertion is a boolean expression that you are
affirming is true at that point in the
program. If it is not, the program will terminate and report and error message.
Assertions are widely used by programmers to detect bugs and gain confidence in the correctness
of programs. They also serve to document the programmer's intent.
// check that a given boolean condition is true assert distance >= 0.0; // with optional detail message assert distance >= 0.0 : "distance can't be negative";
By default, assertions are disabled. To enable, execute with the -enableassertions flag (-ea for short).
The result will be a message like% java -ea Date
Do not use assertions to check arguments in a public method: your program should not rely on assertions for normal operation since they may be disabled.Exception in thread "main" java.lang.AssertionError at Date.next(Date.java:42)
Date.
A culminating example that illustrates use of final, assert, exceptions, immutability, and private helper methods. Would make a nice anatomy diagram. Also could be a set-up for Comparable in the next section.Application: commercial transaction system. Want to know 3 (business days) from today.
Exercise: add a method daysUntil() and use this to implement compareTo() in next section. Add a method day() that returns a String representing the day of the week (e.g., Sunday, Monday).
Program Date.java is an immutable data type that represents a date. Internally, it uses three integers for the month, day, and year. It includes a constructor, toString(), next(), isAfter().
By accessing the data only through the interface, our data type can perform range checking (e.g., no such thing as January 0 or February 30). For simplicity, we ignore leap years. (See Exercise XYZ.) By declaring all instance variables as final, we ensure that they can only be initialized in the constructor. This makes it easy to check the invariant that the values are always in a consistent state. The time invested in performing routine error-checking more than pays off as we start writing more complex programs. Using encapsulation enables us to isolate to a few places where an object can change state.
Vectors.
As second example of a mathematical abstraction, we now consider the vector data type. Like Complex, the basic definition of the vector abstraction is familiar because it has played a central role in applied mathematics for over 100 years. The field of mathematics known as linear algebra is concerned with properties of vectors. Linear algebra is a rich and successful theory with numerous applications that plays an important role in all fields of social and natural science.
A spatial vector is an abstract entity that has a magnitude and a direction. Vectors provide a natural way to describe properties of the physical world, such as force, velocity, momentum, or acceleration. One standard way to specify a vector is as an arrow from the origin to a point in a Cartesian coordinate system: the direction is the ray from the origin to the point and the magnitude is the length of the arrow (distance from the origin to the point). To specify the vector it suffices to specify the point.
The concept extends to any number of dimensions: an ordered list of N real numbers (the coordinates of an N-dimensional point) suffices to specify a vector in N-dimensional space. By convention, we use a boldface letter to refer to a vector and numbers or indexed variable names (the same letter in italics) separated by commas within square brackets to denote its value. For example, we might use x to denote (x[0], x[1], ..., x[N-1]) and y to denote (y[0], y[1], ..., y[N-1]). In Java, this decision implies that we can represent each Vectors data type value as an array of double values.
The basic operations on vectors are to add two vectors, multiply a vector by a scalar, compute the dot product of two vectors, and to com- pute the magnitude and direction, as follows:
- addition: x + y = ( x[0] + y[0], x[1] + y[1], ..., x[N-1] + y[N-1])
- scalar multiplication: t x = (t x[0], t x[1], ..., t x[N-1])
- dot product: x dot y = x[0]y[0] + x[1]y[1] + ... + x[N-1]y[N-1]
- magnitude: |x| = (x[0]^2 + x[1]^2 + ... + x[N-1]^2)^1/2
- direction: x / |x| = (x[0] / |x|, x[1] / |x|, ..., x[N-1] / |x|)
Program Vector.java is an immutable class that implements this API. Note that the instance variable is not a primitive type, but rather an array.![]()
The this reference. The magnitude() and direction() methods in Vector make use of the name this. Java provides the this keyword to give us a way to refer to an instance method to the object whose name was used to invoke this method. You can use this in code in the same way you use any other name. For example, some Java programmers would prefer to code plus() in Complex as follows:
to emphasize that a and b play an equal role in the operation. Some Java programmers always use this to refer to instance variables. Their scope is so large (the whole class) that this policy is easy to defend. However, it does result in a surfeit of this keywords, so we take the opposite tack and use this sparingly in our code. If a method has no parameters, or parameters whose names do not conflict with instance variable names, we normally refer directly to instance variables instead of using this.
public Complex plus(Complex b) { Complex a = this; return new Complex(a.re + b.re, a.im + b.im); }
Similarity search.
Reference: Gauging Similarity via N-Grams. CompareAll.java performs similarity search. It relies on the data type Document.java that uses hashCode().Q + A
Q. What happens if I try to acccess a private instance variable or method from another file?
A. A compile-time error that says the given instance variable or method has private access in the given class.
Q. Does Java have other access modifiers besides public and private? Yes, although we restrict attention in this book to public and private. Here's a table.
Access modifier Class Package Subclass Any Class private Y N N N [no specifier] Y Y N N protected Y Y Y N public Y Y Y Y
Q. The instance variable x in type Complex is declared with access modifier private. How come in the plus method I can access both this.x and b.x?
A. The granularity of private access is at the class level, not the instance level. Declaring an instance variable as private means that it is not directly accessible from any other class. Methods within the Complex class can access (read or write) the instance variables of any instance in that class. It might be nice to have a more restrictive access modifier, say superprivate, that would make the granularity at the instance level so that only the invoking object can access its instance variables.
Q. How would I add a constructor to the Complex data type that takes r and θ as arguments?
A. It's a problem since you already have one constructor that takes two real arguments. Instead, you should probably create two methods createRect(x, y) and createPolar(r, theta) that create and return new objects. This style will also help remind the client to pass in arguments of the correct form.
Q. Is it possible to defeat Java's type safety features?
A. If there are no bugs in the Java implementation, an untrusted program should not have access to the private data of a trusted program. This guarantee assumes that "the computer faithfully executes its specified instruction set." If a cosmic ray strikes the computer's memory, it can flip a bit, thereby invalidating this axiom. Appel and Govindavajhala showed how to craft a program so that when executed, any single bit error would yield a 70% chance of the program gaining unfettered access to the JVM. They also demonstrated that such errors could be induced by shining a 50 watt spotlight bulb at the memory for one minute. Smartcards are particularly susceptible since the attacker has physical access to the computer.
Also reflection can defeat the type safety features, but this requires that the user permit such actions.
Exercises
- Modify Date to handle leap years. Hint: change 28 to 29, add a helper method isLeapYear(), and add one line to next().
- Add a method daysUntil() to Date that takes a Date b as an argument and returns the number of days between b and the invoking date.
- Create an implementation Date2.java that represents a date a single integer that counts the number of days since January 1, 1970. Compare to Date.java.
- Represent a point in time by using an int to store the number of seconds since January 1, 1970. When will programs that use this representation face a time bomb? How should you proceed when that happens?
- Create a data type GeographicCoordinate that represents a geographic coordinate either in (degrees, minutes, seconds, sign) or in floating point.
- Create a data type Location for dealing with locations on Earth using spherical coordinates (latitude/longitude). Include methods to generate a random location on the surface of the Earth, parse a location "25.344 N, 63.5532 W" and compute the great circle distance between two locations.
- Create a Rectangle ADT that represents a rectangle. Represent a rectangle by two points. Include a constructor, a toString method, a method for computing the area, and a method for drawing using our graphics library.
- Repeat the previous exercise, but this time represent a Rectangle as the lower left endpoint and the width and height.
- Repeat the previous exercise, but this time represent a Rectangle as the center and the width and height.
- Particle.
Create a data type for a 3D particle with position (rx, ry, rz),
mass (m), and velocity (vx, vy, vz). Include a method to return its kinetic
energy = 1/2 m (vx^2 + vy^2 + vz^2).
Alternate representionation: position (rx, ry, rz), mass (m), and momentum (px, py, pz). In this case vx = px / mass, vy = py / mass, vz = pz / mass. kinetic energy = 1/2 (px^2 + py^2 + pz^2) / mass.
- Generating random numbers. Different methods to generate a random number from the standard Gaussian distribution. Here, encapsulation enables us to replace one version with another that is more accurate or efficient. Trigonometric method is simple, but may be slow due to calling several transcendental functions. More importantly, it suffers from numerical stability problems when x1 is close to 0. Better method is alternate form of Box-Muller method. reference. Both methods require two values from a uniform distribution and produce two values from the Gaussian distribution with mean 0 and standard deviation 1. Can save work by remembering the second value for the next call. (This is how it is implemented in java.util.Random.) Their implementation is the polar method of Box-Muller, saving the second random number for a subsequent call. (See Knuth, ACP, Section 3.4.1 Algorithm C.)
- LAX aiport shutdown. On September 14, 2004 Los Angeles airport was shut down due to software breakdown of a radio system used by air traffic controllers to communicate with pilots. The program used a Windows API function call GetTickCount() which returns the number of milliseconds since the system was last rebooted. The value is returned as a 32 bit integer, so after approximately 49.7 days it "wraps around." The software developers were aware of the bug, and instituted a policy that a technician would reboot the machine every month so that it would never exceed 31 days of uptime. Oops. LA Times blamed the technician, but the developers are more to blame for shoddy design.
Creative Exercises
- International bank account (depositDollars, depositEuros, withdrawDollars, withdrawEuros). How to represent money (cents, BigDecimal, precision, etc.)
- Turtle graphics. Turtle location starting at origin (forward, rotate, draw) Use polar and Cartesian coordinates.
- Polar representation of points.
Point.java and
PointPolar.java implement
the following point interface using rectangular and polar coordinates,
respectively.
Point() Point(double, double) double x() double y() double r() double theta() double distance(Point) public String toString()
- Spherical coordinates.
Represent a point in 3D space using Cartesian coordinates (x, y, z)
or spherical coordinates (r, θ φ). To convert from one to
the other, use
- Colors. Can represent in RGB, CMYK, or HSV formats. Natural to have different implementations of same interface.
- Time. Implement an ADT that represents the time of day, say with seconds, minutes, and hours. Representation 1: three fields for hours, minutes, and seconds. Representation 2: number of seconds since 12 midnight. Include compareTo, plus.
- ZIP codes. Implement an ADT that represents a USPS ZIP code. Support both the original 5 digit format and the newer (but optional) ZIP+4 format.
- VIN number. Read in a VIN number and report back all relevant information.
- Vector fields. Write a data type for force vectors in two dimensions. You should be able to add vectors using the usual rules from physics using x sin(theta) and x cos (theta)....
- Complex numbers. In Section 3.2, we considered program Complex.java, which is an ADT for complex numbers. Internally, it stores a complex number c = x + iy as its real and imaginary parts (x, y). This is called the rectangular representation. Write a new implementation that represents a complex number as c = r (cos θ + i sin θ). This is known as the polar representation, and we refer to r as the modulus and θ as the angle. This representation is especially useful if we are multiplying or dividing complex numbers. Program Complex.java implements the same complex number data type as Complex.java, but stores the polar representation (r, θ) instead of the the rectangular representation (x, y). Since it has exactly the same public methods and public constructor as the version with rectangular coordinates, we can use it interchangeably with the other version of Complex.java. The main difference to the client is in different performance properties. For example, implementing an exponentiation method becomes much easier with the polar representation, especially if the power is a negative integer or if the power itself is complex! Other operations become more cumbersome to implement and expensive to compute, e.g., addition and subtraction. There is one minor caveat that makes these two implementations not completely interchangeable: floating point precision may influence the two implementations in different ways. For example, the rectangular implementation represents 3 + 4i exactly, whereas the polar implementation must store an approximation to the angle θ since it is an irrational number. Such difference should not effect a well-behaved client, but we should be aware of any places where an encapsulated data type leaks information.
- Statistics.
Encapsulation enables us to easily replace one implementation with
another. This is desirable
when we anticipate creating an improved implementation with better
performance or accuracy properties.
We consider a simple, but representative example below.
Our goal is to store statistics for a set of real values, say the sample
mean x bar and the sample variance s2.
We want our data type to support the following methods: add, size, mean, stddev, variance. OnePass.java stores n, the sum, and the sum of squares. TwoPass.java stores n, the sum, and all of the data values in an array. The one pass algorithm is faster and uses substantially less space, but is susceptible to roundoff error. StableOnePass.java is a well-engineered alternative.
It is numerically stable and does not require an array to store the elements.
public void add(double x) { n++; s = s + 1.0 * (n-1) / n * (x - m) * (x - m); m = m + (x - m) / n; } public double mean() { return m; } public double variance() { return s / (n - 1); } - Encapsulation.
Why does the following break encapsulation, even though
all instance variables are declared private.
Answer: The reason is that the class Date is mutable. The method setDate(seconds) changes the value of the invoking date to the number of milliseconds since January 1, 1970, 00:00:00 GMT. This has the unfortunate consequence that when the function d = getDate() returns the date, the client program can invoke d.setDate() and change the date in an Appointment object type, perhaps setting it to an illegal value for a member of Appointment. Must not let references to mutable objects escape since caller can then modify its state. One solution is to create a defensive copy of the Date before returning it using new Date(date.getTime()); also need to do a defensive copy when storing it via this.date = new Date(date.getTime()). Many programmers regard the mutability of Date as a design flaw. (GregorianCalendar is a more modern Java library for storing dates; but it is mutable too.)public class Appointment { private Date date; private String customer; public Appointment(Date date) { // check that date is in some legal range this.date = date; } public Date getDate() { return date; } - Sparse vector. Create a data type for sparse vectors. Represent a sparse vector by an array of indices (of nonzeros) and a parallel array of the corresponding nonzero values. Assume the indices are in ascending order. Implement the dot product operation.
- Genome.
Implement a data type to store the genome of an organism. Biologists
often abstract away the genome to a sequence of nucleotides (A, C, G, or T).
The data type should support the method addNucleotide,
nucleotideAt(int i), and doSomeComputation.
Perhaps change to addCodon.
Advantages of encapsulation: can check that only legal
nucleotides are added, can change to more time or memory efficient
implementation without affecting client.
- StringGenome.java has one instance variable of type String. It implements addNucleotide with string concatenation. Each method call takes time proportional to the size of the current genome. Not practical spacewise either for large genomes since nucleotide is stored as a 16-bit char.
- Genome.java implements a genome as an array of characters. The size of the array is doubled when the array fills up. The method addNucleotide is now constant time. Space consumption is still 16 bits per nucleotide.
- CompactGenome.java implements a genome as boolean array. We need to use two bits per nucleotide since there are 4 different nucleotides. As in the previous implementation, we use a dynamic array with repeated doubling. Now, each nucleotide consumes 2 bits of storage (instead of 16).
- Given a Genome Include a method to return the reverse-complemented genome. Include a method for testing for equality.
- Add methods to Date to check which season (Spring, Summer, Fall, Winter) or astrological sign (Pisces, Libra, ...) a given date lies. Be careful about events that span December to January.
- Copy constructor.
Only needed if data type is mutable. Otherwise, assignment statement
works as desired.
public Counter(Counter x) { count = x.count; } Counter counter1 = new Counter(); counter1.hit(); counter1.hit(); counter1.hit(); Counter counter2 = new Counter(counter1); counter2.hit(); counter2.hit(); StdOut.println(counter1.get() + " " + counter2.get()); // 3 5
Engineering.
Engineering involves the design, implementation, and analysis of a useful solution to a practical problem. The design is limited by resource constraints including time, money, usability, flexibility for future modification, maintainability, safety, esthetics, marketing, and ethics. The analysis models how well the design will perform and whether it will be the project constraints. Prototyping, testing, simulation are all components of a proper analysis. Designing quality software is similar in many ways to designing bridges. "A good scientist is a person with original ideas. A good engineer is a person who makes a design that works with as few original ideas as possible." -- Freeman Dyson. When programming, use the principle of least surprise.Software engineering.
Programming is a form of engineering: apply computer science to the benefit of humankind. Unlike other forms of engineering (which have been around for hundreds of years), software engineering is very young (~30 years) and no widespread consensus on best practice and engineering standards. Nevertheless, some key ideas have emerged (modular programming, data abstraction, encapsulation). OOP is one of the main tools for building complex programs. We will also consider other software engineering aspects that complement OOP including specification, design, coding, and testing. a computer program to solve a problem. Testing, unit tests (testing smallest compilable element in isolation), integration tests (testing several interacting modules), functional tests tests, regression testing... Test characteristics: functional (does it do what it should do, and not do what it should not), performance, usability, security, etc.) Test detail: unit test (one xxx), integration (several interacting modules), system (the whole thing). Test access: black box (treat module as a black box), white box (use internal structure to determine test cases). Each test should return true or false. Specification is often the biggest source of errors. Compile-time, run-time, and logic errors. Compiler error messages can be cryptic, but usually can be resolved after a few minutes. On the other hand logic errors are the bane of programming. Can take hours to days to discover the error, even if it was just a simple typo. We should be thankful of all the errors that the compiler discovers, and it is a blessing to use a language and compiler that facilitate discovering errors at compile time.In practice, companies hire employees to solely perform testing. Ratio of testers to programmers can range from 10:1 to 1:10.
Fred Brooks hypothesizes that the number of lines of code a programmer writes per day is about the same, regardless of which language they use (Java, C, machine language). One of main goals of using a higher level programming language (e.g., Java) and OOP is so that we can write 10 lines of Java code that replaces 1,000 lines of machine language code. Another goal is to write programs that are easier to debug than machine language programs. OOP helps address both goals.
Software life-cycle. Analysis and specification, design, construction, testing, operation, maintenance. Waterfall model: do all of the tasks in sequence, reaching decisive milestones along the way. (Inflexible, especially if design is wrong.) Prototype model: build quick prototype, evaluate, and redesign if needed. (harder to manage and throw away previous designs)
Testing. Design good test cases (especially at boundaries), test coverage (use computer to generate test inputs), unit tests, regression testing. Program should always either complete successfully or output a useful error message. Bugs in computer-aided design (CAD) program nuclear reactor control system, guided missile, spacecraft environmental control can be disastrous. Validation = checking that the software does what the customer specified (producing the right product). Verification = checking that the software has been written according to design spec (producing the product right).
Often, it's convenient to write the tests before the code. If you can't write the tests, you might need to change the interface or specs. If you still can't, you need to spend more time thinking before you begin to write code.
Bugs.
CMU study of 30,000 programs: 5-6 defects per 1,000 lines of code in
production software.
New programs: 100 defects per 1,000 lines of code.
$60 million bug (misused break statement in C program) in
ATT long
distance system - 114 switching nodes crashed on January 15, 1990
collection
of software bugs.
