3.1   Using Data Types

This section is under construction.

Programming in Java is largely based on doing so with data types known as reference types that are designed to support object-oriented programming. The primitive data types (such as boolean, int, and double) that you have been using are supplemented in Java by extensive libraries of reference types that are tailored for a large variety of applications. As examples, you will learn in this section how to use several reference types, for string processing and image processing.

Basic definitions.

A data type is a set of values and a set of operations defined on those values. For example, the values of the primitive data type int are integers between −231 and 231 − 1; the operations of int are the basic arithmetic and logical operations that we have been using since Chapter 1.


You have already been using a data type that is not primitive—the String data type. The values of String are sequences of characters. We specify the behavior of a data type in application programming interface (API). The following table summarizes the instance methods from Java’s String data type that we use most often. Several of the methods use integers to refer to a character’s index within a string; as with arrays, these indices start at 0.

String API
The first entry, with the same name as the class and no return type, defines a special method known as a constructor. The other entries define instance methods that can take arguments and return values.
using a reference data type
Now, we consider various string-processing examples.


Color is a sensation in the eye from electromagnetic radiation. Since people want to view and manipulate color images on their computers, color is a widely used abstraction in computer graphics and Java provides a Color data type. Java has thousands of data types in its libraries, so we need to explicitly list which Java libraries we are using in our program to avoid naming conflicts. Specifically, we include the statement
import java.awt.Color;
at the beginning of any program that uses Color. Color values

To represent color values, Color uses the RGB system where a color is defined by three integers (each between 0 and 255) that represent the intensity of the red, green, and blue (respectively) components of the color. Other color values are obtained by mixing the red, blue and green components. That is, the data type values of Color are three 8-bit integers. With this convention, Java is using 24 bits to represent each color and can represent 256^3 = 2^24 ~ 16.7 million possible colors. Scientists estimate that the human eye can distinguish only about 10 million distinct colors.

Color has a constructor that takes three integer arguments, so that you can write, for example

Color red    = new Color(255,   0,   0);  
Color white  = new Color(255, 255, 255);  
Color sienna = new Color(160,  82,  45); 
to create objects whose values represent the colors red, white, and sienna, respectively. We have been using colors in StdDraw since Section 1.5, but have been limited to a set of predefined colors such as StdDraw.BLACK, StdDraw.RED, StdDraw.BLUE, and StdDraw.GRAY. Now we have millions of colors from which to choose. Spectrum.java is a demonstration that shows you all of the colors, using StdDraw.

The java.awt.Color API contains several constructors and over twenty methods; we briefly summarize the ones that we will use:

Color API
Luminance.java is a static method library that we can use to convert to grayscale and test whether two colors are compatible.

Image processing.

You are familiar with the concept of a photograph. The history of photography is a history of technological development. During the last century, photography was based on chemical processes, but its future is now based in computation.

Digital images. We have been using standard draw to plot geometric objects (points, lines, circles, squares) in a window on the computer screen. The basic abstraction for computer displays is the same one that is used for digital photographs and is very simple: A digital image is a rectangular grid of pixels (picture elements), where the color of each pixel is individually defined. Digital images are sometimes referred to as raster or bitmapped images.

Our class Picture.java is a data type for digital images. The set of values is a two-dimensional array of Color values, and the operations are what you might expect: create an image (either a blank one with a given width and height or one initialized from a picture file), set the value of a pixel to a given color, return the color of a given pixel, return the width or the height, show the image in a window on your computer screen and save the image to a file, as specified in the following API summary:

Picture API
These methods, coupled with Java's Color data type, open the door to image processing for you. By convention, (0, 0) is the upper leftmost pixel, so the image is laid out in the customary manner for arrays (by contrast, the convention for StdDraw is to have the point (0,0) at the lower left corner, so that drawings are oriented in customary manner for Cartesian coordinates).


In Section XYZ we learned how to read numerical and text input from the terminal using StdIn.java. However, this supported only one input stream (standard input). Sometimes our programs need to read data from several input sources (standard input, files, web sites). Program In.java is a convenient class to do exactly this.


Screen scraping.

Now we illustrate a nice combination of using the built-in String library with our In library. The goal is to query a web page, extract some information, and report back the results. This process in known as screen scraping. Program StockQuote.java the symbol of New York Stock Exchange stock and prints out its current trading price. To report the stock price of Google (NYSE symbol = goog), it reads the Web page http://finance.yahoo.com/q?s=goog". Then, it identifies the relevant information using indexOf() and substring().

The program depends on the web page format of Yahoo. If Yahoo changes their web page format, we would need to change our program. Nevertheless, this is likely more convenient than maintaining the data ourselves.


We are also interested in writing output to files or the network instead of just standard output. Program Out.java provides a mechanism for writing data to various output streams. Using Out, writing to a file is almost as easy as writing to standard output.


Program Cat.java takes a sequence of strings as command line inputs (names of text files), and concatenates them, printing the results to standard output.

Program Split.java This program uses multiple output streams to split a CSV file into separate files, one for each comma-delimited field.

Primitive types vs. reference types.

Java has two different categories of data types: primitive types (also known as value types) and reference types. We're already familiar with many of the eight primitive types in Java: boolean, char, byte, short, int, long, float, and double. Value types store the integral, floating pointing or boolean value, e.g., 17, 3.14, or true. The primitive types and the associated operations are typically implemented directly in hardware, so they are especially efficient. When we declare a variable of a primitive type, the system allocates enough memory to store a value of that type (e.g., 4 bytes for an int and 8 bytes for a double). We can initialize or modify the value using the assignment operator. However, when we pass a variable to a function, the system passes a copy of the integral, floating point or boolean value itself. This is called pass-by-value. One consequence is that if we pass an integer variable a to a function, the function cannot change the value that is stored in a since we have only passed a copy of the value stored in a.

Reference types have very different characteristics. A reference stores the memory address of an object. It captures the difference between a thing and its name.

Thing Name
Web page www.princeton.edu
Email inbox nobody@princeton.edu
Bank account 45-234-23310076
US citizen 166-34-9114
Word of TOY memory 1C
Byte of computer memory FFBEFB24
Cell phone (609) 876-5309
House 35 Olden Street

We say that the reference points to the object and often draw an arrow from the reference to what it points to. A reference of a given type always points to an object of the correct type or to the special value null which indicates that the reference points to nothing. Each reference points to one object, but two or more references can point to the same object. We initialize a reference by using an assignment statement. We can either set it equal to another reference (of the appropriate type) or we can also use the keyword new to make it point to a newly created object. Java manipulates objects by reference, but passes them to methods and functions by value. This means that if we pass a variable p of type Picture to a function, the function can change the state of the object referenced by p, e.g., change the colors of some of the pixels. It cannot, however, change what object p references.

Analogy. Object = house. Reference = paper with street address of house written in pencil. Each piece of paper can have at most one address. We can give piece of paper to a house painter and tell them to paint the house red. Can have multiple pieces of paper with the same address. When the house is painted, both pieces of paper have the street address of the same red house. It's possible to erase what's on the paper, and write down a new street address. But if you change what's written on your piece paper, it doesn't change what's written on my piece of paper.

Cash (value) vs. credit card (pointer to value).

In Program xyz, it is important to compare the strings s1 s2 with s1.equals(s2) instead of (s1 == s2). The former is a built-in method that tests whether the two strings are composed of exactly the same sequence of characters. The latter checks whether the two objects reference the same location in memory.

The distinction between primitive and reference types is a tradeoff between efficiency and elegance. There is substantial memory overhead (around 16 bytes) associated with creating each new object. OOP purists would argue that a language should not have any primitive types, only objects and reference types.

Arrays are objects.

Arrays are treated as objects in Java, except that there is special syntax for indexing into an array using square brackets. Otherwise, an array is just another example of a reference types. When we pass an array to a function, the system passes a copy of the reference, not a copy of the array. This means that the function is free to modify the contents of the array. If the array is huge, there is substantial savings in passing a reference to the array. Give example of pass-by-value....

Automatic memory management.

One of the most significant features of the Java programming language is its ability to automatically manage memory. Memory management is straightforward with primitive types: allocate a fixed chunk of memory (e.g., 4 bytes for an int) when we declare a variable, and release it when the variable goes out of scope. It is important to free the memory whenever possible since your computer only has a finite amount of memory, and it may eventually consume it all. Managing memory for reference types is substantially more challenging. Each time we create an object with the keyword new, the system finds a free chunk of memory of the right size, and reserves it for the object. When the object is no longer accessible (e.g., the last reference to it goes out of scope), we want the system to free up the memory and recycle it for it for use the next time you create an object with new. In many languages (including C and C++) the programmer is responsible for marking those objects that it no longer needs. This process is tedious and notoriously error-prone. If the programmer is not diligent, the system may slowly leak memory and eventually run out. Many modern languages (including Java) include a garbage collector to transfer the burden of memory management from the programmer to the system. The garbage collector periodically identifies chunks of memory that are not in use, and notifies the system to reclaim them. If a chunk of memory has no references to it, then it can be safely garbage-collected since the programmer would have no way of accessing it.

Particle system physics engine.

Jeff Traer-Bernstein has a particle system physics engine.

Q + A

Q. What's the difference between =, ==, and equals?

A. Variant of same question of 2.2 and 2.3.

Q. Why doesn't (s == "") work reliably for checking whether s is the empty string?

A. It compares whether the two strings refer to the same memory location. The two strings can refer to different memory locations which each represent the string "".

Q. How can I check whether a string s is the empty string?

A. Use "".equals(s) or (s != null && s.length() == 0).

Q. Is there a difference between the empty string and null?

A. Yes. The empty string is a string consisting of 0 characters. You can invoke all of the usual string methods, e.g., length. On the other hand, null is not a string object. You will get a NullPointerExceptionError if you try to invoke any method on a variable storing null.

Q. What's the substring trap?

A. The String method call s.substring(i, j) returns the substring of s starting at index i and ending at j-1 (not at j as you might suspect).

Q. Can I apply several string operations at once?

A. Yes, the statement s.trim().toLowerCase().equals("saturday") works as expected (e.g., returns true if s represents the string " Saturday "). The reason it works is because (i) each of the methods returns its result as a string and (ii) methods in Java are invoked from left to right.

Q. What special capabilities do strings have over other objects?

A. String concatenation with + and assignment using quoted sequences of characters.

Q. How can I change the value of a string object?

A. You can't since strings are immutable in Java. However, you can make a string reference point to a different string by reassigning it.

String s = "aaaa";            // s refers to the value "aaaa"
s.replaceAll("a", "b");       // s still refers to the value "aaaa"
String t = s;                 // t refers to the value "aaaa"
s = s.replaceAll("a", "b");   // s now refers to the value "bbbb"
                              // t still refers to the value "aaaa"

Q. Why use the RGB format for representing colors?

A. RGB is commonly used in television screens, computer monitors and digital cameras. The screens (CRT or LCD) are comprised of thousands or millions of tiny red, green, and blue dots. The device can light each pixel up in various degrees of brightness. It is an additive model (the device emits a combination of the colors) as opposed to a subtractive model (where the medium reflects the colors, e.g., with dyes or pigments).

Q. When using a linear filter, each pixel becomes a weighted average of its 8 neighbors. What do I do when the pixel has less than 8 neighbors because it is near the border?

A. You could assume the image is toroidal (periodic boundary conditions) and make the left boundary wrap around to the right boundary.

Q. How can I convert from an uppercase letter to an integer between 0 and 25?

A. Recall that characters are 16-bit integers and that the uppercase letters are store consecutively in ascending order. The expression c - 'A' does the job. You can also use c - '0' to convert from one of the characters '0' through '9' to the corresponding integer.

Q. Where can I download some test files for image processing?

A. USC SIPI contains standard test images (including Peppers and Baboon).

Q. What special capabilities do arrays have over other objects?

A. Indexing into the array using square brackets. Declaring an array involves specifying its type with square brackets. Initializing an array involves using either new with square braces or list its constituent values within curly braces.

Q. How can I pass an array to a function in such a way that the function cannot change the array?

A. You can't since arrays are mutable. However, you can achieve the same effect by building a wrapper data type and passing that instead. Stay tuned.

Q. I've heard that Java has no "pointers." Is this true?

A. It's true that Java doesn't have an explicit pointer type, but you should view Java references as "safe pointers." Java's implementation of references is opaque so you cannot do pointer arithmetic on them or cast them to numeric types. Java also automatically dereferences pointers as needed.

Q. Is it correct to say that Java passes primitive types by value and objects by reference?

A. Not quite. See this explanation. For C++ programmers, it's better to think of a Java reference as a "safe pointer." Java references behave like C++ references, except that assignment and == work like pointers.

Q. When I declare an object, when do I need to use new?

A. Every time you call new, you get a new instance of the data type, with its own instance variables. If you don't need a new instance, but rather just want a new variable of the given type to reference an existing instance, don't use new. Methods can also return new instances by invoking new, e.g., the plus method with the Complex data type.


  1. Write a function that takes as input a string and returns the number of occurrences of the letter e.
  2. Give a one line Java code fragment to replace all periods in a string with commas. Answer: s = s.replace(".", ",").

    Don't use s = s.replaceAll(".", ","). The replaceAll() method uses regular expressions (See Section 7.2) and "." string has a special meaning.

  3. Replace all tabs with four spaces. Answer: s = s.replace("\t", "    ").
  4. Write a program that takes a command line input string s, reads strings from standard input, and prints out the number of times s appears. Hint: use don't forget to use equals instead of == with references.
  5. Write a program that reads in the name of a month (3 letter abbreviation) as a command line parameter and prints the number of days in that month in a non leap year.
    public static void main(String[] args) {
        String[] months = {
                            "Jan", "Feb", "Mar", "Apr", "May", "Jun", 
                            "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
        int[] days = {
                        31, 28, 31, 30, 31, 30,
                        31, 31, 30, 31, 30, 31
        String name = args[0];
        for (int i = 0; i < 12; i++)
           if (name.equalsIgnoreCase(months[i]))
              System.out.println(name + " has " + days[i] + " days");

  6. Write a program Squeeze.java that takes as input a string and removes adjacent spaces, leaving at most one space in-a-row.
  7. What does the following code fragment do?
    public static void main(String[] args) {
       String s1 = args[0];
       String s2 = args[1];
       int length1 = s1.length();
       int length2 = s2.length();
       if (length1 > length2) System.out.println(length1);
       else                   System.out.println(length2);

    Which one or more of the following converts all of the strings in the array a to upper case?

    for (int i = 0; i < a.length; i++) {
       String s = a[i];
       s = s.toUpperCase();
    for (int i = 0; i < a.length; i++) {
    for (int i = 0; i < a.length; i++) {
       a[i] = a[i].toUpperCase();
    Answer: only the last one.
  8. Write a function that takes as input a string and returns the string in reverse order.
  9. What does the following recursive function return, given an input string s?
    public static String mystery(String s) {
       int N = s.length();
       if (N <= 1) return s;
       String a = s.substring(0, N/2);
       String b = s.substring(N/2, N);
       return mystery(b) + mystery(a);
  10. Describe the string that the following function returns, given a positive integer N?
    public static String mystery(int N) {
       String s = "";
       while(N > 0) {
           if (N % 2 == 1) s = s + s + "x";
           else            s = s + s;
           N = N / 2;
       return s;
  11. Write a function that takes a string s and an integer n and returns a new string t of length exactly n that consists of s (truncated if its length is greater than n) followed by a sequence of '-' characters (if the length of s is less than n).
  12. Write a function that takes as input a string and returns true if the string is a palindrome, and false otherwise. A palindrome is a string that reads the same forwards or backwards.
  13. Write a function that takes as input a string and returns true if the string is a Watson-Crick complemented palindrome, and false otherwise. A Watson-Crick complemented palindrome is a DNA string that is equal to the complement (A-T, C-G) of its reverse.
  14. Write a function that takes as input a DNA string of A, C, G, and T characters and returns the string in reverse order with all of characters replaced by their complements. For example, if the input is ACGGAT, then return ATCCGT.
  15. What does the following recursive function return, given two strings s and t of the same length?
    public static String mystery(String s, String t) {
       int N = s.length();
       if (N <= 1) return s + t;
       String a = mystery(s.substring(0, N/2), t.substring(0, N/2));
       String b = mystery(s.substring(N/2, N), t.substring(N/2, N));
       return a + b;
  16. Write a program that reads in a string and prints out the first character that appears exactly once in the string. Ex: ABCDBADDAB -> C.
  17. Given a string, create a new string with all the consecutive duplicates removed. Ex: ABBCCCCCBBAB -> ABCBAB.
  18. Write a function that takes two string arguments s and t, and returns the index of the first character in s that appears in ts (or -1 if no character in s appears in t).
  19. Given a string s, determine whether it represents the name of a web page. Assume that all web page names start with http:

    Solution: The easiest way is using the startsWith method in Java's string library, e.g., if (s.startsWith("http:")).

  20. Given a string s that represents the name of a web page, break it up into pieces, where each piece is separated by a period, e.g., http://www.cs.princeton.edu should be broken up into www, cs, princeton, and edu, with the http:// part removed. Use either the split or indexOf methods.
  21. Given a string s that represents the name of a file, write a code fragment to determine its file extension. The file extension is the substring following the last period. For example, the file type of monalisa.jpg is jpg, and the file type of mona.lisa.png is png.

    Library solution: this solution is used in Picture.java to save an image to the file of the appropriate type.

    String extension = s.substring(s.lastIndexOf('.') + 1);

  22. Given a string s that represents the name of a file, write a code fragment to determine its directory portion. This is the prefix that ends with the last / character (the directory delimiter); if there is no such /, then it is the empty string. For example, the directory portion of /Users/wayne/monalisa.jpg is /Users/wayne/.
  23. Given a string s that represents the name of a file, write a code fragment to determine its base name (filename minus any directories). For /Users/wayne/monalisa.jpg, it is monalisa.jpg.
  24. What does the following code fragment print out?
    String string1 = "hello";
    String string2 = string1;
    string1 = "world";
  25. Write a program that reads in text from standard input and prints it back out, removing any lines that consisting of only whitespace.
  26. Write a program that reads in text from standard input and prints it back out, replacing all single quotation marks with double quotation marks.
  27. Write a program Paste.java that takes an arbitrary number of command line inputs and concatenates the corresponding lines of each file, and writes the results to standard output. (Typically each line in given file has the same length.) Counterpart of the program Cat.java.
  28. What does the program LatinSquare.java print when N = 5?
    for (int i = 0; i < N; i++) {
       for (int j = 0; j < N; j++) {
          char c = alphabet.charAt((i + j) % N);
          System.out.print(c + " ");

    A Latin square of order N is an N-by-N array consisting of N different symbols, such that each symbol appears exactly once in each row and column. Latin squares are useful in statistical design and cryptography.

  29. What does the following code fragment print?
    String s = "Hello World";
    s.substring(6, 11);

    Answer: Hello World. The methods toUpperCase and substring return the resulting strings, but the program ignores these so s is never changed. To get it to print WORLD, use s = s.toUpperCase() and s = s.substring(6, 11).

  30. What happens when you execute the following code fragment?
    String s = null;
    int length = s.length();

    Answer: you get a NullPointerException since s is null and you are attempting to dereference it.

  31. What are the values of x and y after the two assignment statements below?
    int x = '-'-'-';
    int y = '/'/'/';
  32. Suppose that a and b are each integer arrays consisting of 100 million integers. What does the follow code do. How long does it take?
    int[] t = a;
    a = b;
    b = t;
    Answer It swaps them, but it does so without copying millions of elements.
  33. What does the following statement do where c if of type char?
    System.out.println((c >= 'a' && c <= 'z') ||
                       (c >= 'A' && c <= 'Z'));
    Answer: prints true if c is an uppercase or lowercase letter, and false otherwise.
  34. Write an expression that tests whether or not a character represents one of the digits '0' through '9' without using any library functions.
    boolean isDigit = ('0' <= c && c <= '9');

  35. Write a program FlipY.java that reads in an image and flips it vertically.
  36. Write a program WidthChecker.java that takes a command line parameter N, reads text from standard input, and prints to standard output all lines that are longer than N characters (including spaces).
  37. Write a program Hex2Decimal.java that converts from a hexadecimal string (using A-F for the digits 11-15) to decimal.

    Answer: the following solution uses several string library methods and Horner's method.

    public static int hex2decimal(String s) {
       String digits = "0123456789ABCDEF";  
       s = s.toUpperCase();
       int val = 0;
       for (int i = 0; i < s.length(); i++) {
          char c = s.charAt(i);
          int d = digits.indexOf(c);
          val = 16*val + d;
       return val;

    Alternate solution: Integer.parseInt(String s, int radix). More robust, and works with negative integers.

Creative Exercises

  1. Picture dimensions. Write a program Dimension.java that take the name of an image file as a command line input and prints out its dimension (width-by-height).
  2. Bounding box. Write a program BoundingBox.java that reads in an image file and output the smallest bounding box (rectangle parallel to the x and y axes) that contains all of the non-white pixels. Useful for automatic cropping.
  3. Anti-aliasing. Anti-aliasing is a method of removing artifacts from representing a smooth curve with a discrete number of pixels. A very crude way of doing this (which also blurs the image) is to convert an N-by-N grid of pixels into an (N-1)-by-(N-1) by making each pixel be the average of four cells in the original image as below. Write a program AntiAlias that reads in an integer N, then an N-by-N array of integers, and prints out the antialiased version. Reference.

  4. Thresholding. Write a program Threshold.java that reads in a grayscale version of a black-and-white picture, creates and plots a histogram of 256 grayscale intensities, and determines the threshold value for which pixels are black, and which are white.
  5. Mirror image. Read in a W-by-H picture and produce a 2W-by-H picture which concatenates the original W-by-H picture with the mirror image of the W-by-H picture. Repeat by mirror around the y-axis. Or create a W-by-H picture, but mirror around the center, deleting half the picture.
  6. Linear filters. A box filter or mean filter replaces the color of pixel (x, y) by the average of its 9 neighboring pixels (including itself). The matrix [1 1 1; 1 1 1; 1 1 1] / 9 is called the convolution kernel. The kernel is the set of pixels to be averaged together. Program MeanFilter.java implements a mean filter using the Picture data type.
  7. Blur filter. Use low-pass 3-by-3 uniform filter [1/13 1/13 1/13; 1/13 5/13 1/13; 1/13, 1/13, 1/13].
  8. Emboss filter. Use prewitt masks [-1 0 1; -1 1 1; -1 0 1] (east) or [1 0 -1; 2 0 -2; 1 0 -1], [-1 -1 0; -1 1 1; 0 1 1] (south-east),
  9. Sharpen filter. Psychophysical experiments suggest that a photograph with crisper edges is more aesthetically pleasing than exact photographic reproduction. Use a high-pass 3-by-3 filter. Light pixels near dark pixels are made lighter; dark pixels near light pixels are made darker. Laplace kernel. Attempts to capture region where second derivative is zero. [-1 -1 -1; -1 8 -1; -1 -1 -1]
  10. Oil painting filter. Set pixel (i, j) to the color of the most frequent value among pixels with Manhattan distance W of (i, j) in the original image.
  11. Reverse string. Write a recursive function to reverse a string. Do not use any loops. Hint: use the String method substring.
    static String reverse(String s) {
        int N = s.length();
        if (N == 0) return "";
        else return reverse(s.substring(1, N)) + s.charAt(0);
  12. Frequency analysis of English text. Write a program LetterFrequency.java that reads in text from standard input (e.g., Moby Dick) and calculate the fraction of times each of the 26 lowercase letters appears. Ignore uppercase letters, punctuation, whitespace, etc. in your analysis. Use CharStdIn.java from Section 2.4 to read process the text file.
  13. Complemented DNA string. Write a program to read in a DNA string (A, C, T, G) and print out its complement (substitute A for T, T for A, C for G, and G for C). Hint: use replace several times, but be careful.
  14. Print longest word. Read a list of words from standard input, and print out the longest word. Use the length method.
  15. Print longest word(s). Repeat the previous exercise, but print out all of the longest words if there is a tie, say up to a maximum of 10 words. Use an array of strings to store the current longest words.
  16. Test if two files are equal. Write a program that takes the name of two text files as command line inputs and checks if their contents are identical.
  17. Parsing command line options. Unix command line programs typically support flags which configure the behavior of a program to produce different output, e.g., "wc -c". Write a program that takes any number of flags from the command line and runs whichever options the user specifies. To check options, use something like if (s.equals("-v")).
  18. Capitalization. Write a program Capitalizer.java that reads in text strings from standard input and modifies each one so that the first letter in each word is uppercase and all other letters are lowercase.
  19. Reverse domain. Write a program to read in a domain name as a command line input and print the reverse domain. For example, the reverse domain of cs.princeton.edu is edu.princeton.cs. This is useful for web log analysis.
  20. Railfence transposition cipher. Write a program RailFenceEncoder.java that reads in text from standard input and prints out the characters in the odd positions, followed by the even positions. For example, if the original message is "Attack at Dawn", then you should print out "Atc tDwtaka an". This is a crude form of cryptography.
  21. Railfence transposition cipher. Write a program RailFenceDecoder.java that reads in a message encoded using the railfence transposition cipher and prints out the original message by reversing the encryption process.
  22. Scytale cipher. The scytale cipher is one of the first cryptographic devices used for military purposes. (See The Code Book, p. 8 for a nice picture.) It was used by the Spartans in the fifth century BCE. To scramble the text, you print out every kth character starting at the beginning, then every kth character starting at the second character, and so forth. Write a pair of programs ScytaleEncoder.java and ScytaleDecoder.java that implement this encryption scheme.
  23. Kama-sutra cipher. The Kama-sutra, written in the fourth century BCE by Vatsyayana, outlines 64 arts that women should study. Number 45 outlines a method to help women conceal their secret affairs. Each letter in the alphabet is paired up with another one, as in the table below:
    A B C E F G H K L M N P R
    Q D Z U J I X Y W S O V T

    Then a message is encoded by replacing each letter with its pair. For example, the message "MEET AT ELEVEN" is encoded as "SUUR QR UWUPUO". This is one of the earliest known substitution ciphers. Write a program KamaSutra.java that scrambles a message using this scheme. Observe that you can unscramble a message by applying the same scheme again.

  24. Password checker. Write a program that reads in a string from the command line and checks whether it is a "good" password. Here, assume "good" means that it (i) is at least 8 characters long, (ii) contains at least one digit 0-9, (iii) contains at least one upper case letter, (iv) contains at least one lower case letter, and (v) contains at least one non-alphanumeric character.
  25. Subsequence. Given two strings s and t, write a program Subsequence.java that determines whether s is a subsequence of t. That is, the letters of s should appear in the same order in t, but not necessarily contiguously. For example accag is a subsequence of taagcccaaccgg.
  26. Bible codes. Some religious zealots believe that the Torah contains hidden phrases that appear by reading every kth letter, and that such pattern can be used to find the Ark of the Covenant, cure cancer, and predict the future. Results not based on scientific method and results have been debunked by mathematicians and attributed to illicit data manipulation. Using the same methodology one can find statistically similar patterns in a Hebrew translation of War and Peace.
  27. Word chain checker. Write a program that reads in a list of words from the command line and prints true if they form a word chain and false otherwise. In a word chain, adjacent words must differ in exactly one letter, e.g., HEAL, HEAD, DEAD, DEED, DEER, BEER.
  28. Haiku detector. Write a program that reads in text from standard input and checks whether it forms a haiku. A haiku consists of three lines containing the correct number of syllables (5, 7, and 5, respectively). For the purpose of this problem, define a syllable to be any contiguous sequence of consecutive vowels (a, e, i, o, u, or y). According to this rule, haiku has two syllables and purpose has three syllables. Of course, the second example is wrong since the e in purpose is silent.
  29. ISBN numbers. Write a program to check whether an ISBN number is valid. Recall check digit. An ISBN number can also have hyphens inserted at arbitrary places.
  30. Longest common prefix. Write a function that takes two input string s and t, and returns the longest common prefix of both strings. For example, if s = ACCTGAACTCCCCCC and t = ACCTAGGACCCCCC, then the longest common prefix is ACCT. Be careful if s and t start with different letters, or if one is a prefix of the other.
  31. Complemented palindrome detector. In DNA sequence analysis, a complemented palindrome is a string equal to its reverse complement. Adenine (A) and Thymine (T) are complements, as are Cytosine (C) and Guanine (G). For example, ACGGT is a complement palindrome. Such sequences act as transcription-binding sites and are associated with gene amplification and genetic instability. Given a text input of N characters, find the longest complemented palindrome that is a substring of the text. For example, if the text is GACACGGTTTTA then the longest complemented palindrome is ACGGT. Hint: consider each letter as the center of a possible palindrome of odd length, then consider each pair of letters as the center of a possible palindrome of even length.
  32. DNA validation. Write a function that takes as input a string and returns true if it consists entirely of A, C, G, and T's, and false otherwise.
  33. Highest density C+G region. Given a DNA string s of A, C, T, G and a parameter L, find a substring of s that contains the highest ratio of C + G characters among all substrings that have at least L characters.
  34. DNA to RNA. Write a function that takes a DNA string (A, C, G, T) and returns the corresponding RNA string (A, C, G, U).
  35. DNA complement. Write a function that takes as input a DNA string (A, C, G, T) and returns the complementary base pairs (T, G, C, A). DNA is typically found in a double helix structure. The two complementary DNA strands are joined in a spiral structure.
  36. Circular shifts. Application: computational biology. A string s is a circular shift of a string t if its characters can be circularly shifted to the right by some number of positions, e.g., actgacg is a circular shift of tgacgac, and vice versa. Write a program that checks whether one string s is a circular shift of another t. Hint: it's a one liner with indexOf and string concatenation.
  37. Substring of a circular shifts. Write a function that takes two strings s and t, and returns true if s is a substring of a circular string t, and false otherwise. For example gactt is a substring of the circular string tgacgact.
  38. DNA to Protein. A protein is a large molecule (polymer) consisting of a sequence of amino acids (monomers). Some examples of proteins are: hemoglobin, hormones, antibodies, and ferritin. There are 20 different amino acids that occur in nature. Each amino acid is specified by three DNA base pairs (A, C, G, or T). Write a program to read in a protein (specified by its base pairs) and converts it into a sequence of amino acids. Use the following table. For example, the amino acid Isoleucine (I) is encode by ATA, ATC, or ATT.

    Rosetta stone of life.

    TTT Phe    TCT Ser    TAT Tyr    TGT Cys
    TTC Phe    TCC Ser    TAC Tyr    TGC Cys
    TTA Leu    TCA Ser    TAA ter    TGA ter
    TTG Leu    TCG Ser    TAG ter    TGG Trp
    CTT Leu    CCT Pro    CAT His    CGT Arg
    CTC Leu    CCC Pro    CAC His    CGC Arg
    CTA Leu    CCA Pro    CAA Gln    CGA Arg
    CTG Leu    CCG Pro    CAG Gln    CGG Arg
    ATT Ile    ACT Thr    AAT Asn    AGT Ser
    ATC Ile    ACC Thr    AAC Asn    AGC Ser
    ATA Ile    ACA Thr    AAA Lys    AGA Arg
    ATG Met    ACG Thr    AAG Lys    AGG Arg
    GTT Val    GCT Ala    GAT Asp    GGT Gly
    GTC Val    GCC Ala    GAC Asp    GGC Gly
    GTA Val    GCA Ala    GAA Glu    GGA Gly
    GTG Val    GCG Ala    GAG Glu    GGG Gly

    Amino acid Abbrev Abbrev   Amino acid Abbrev Abbrev
    Alanine ala A Lleucine leu L
    Arginine arg R Lysine lys K
    Asparagine asn N Methionine met M
    Aspartic Acid asp D Phenylalanine phe F
    Cysteine cys C Proline pro P
    Glutamic Acid glu E Serine ser S
    Glutamine gln Q Threonine thr T
    Glycine gly G Tryptophan trp W
    Histidine his H Tyrosine tyr Y
    Isoleucine ile I Valine val V

  39. Counter. Write a program that reads in a decimal string from the command line (e.g., 56789) and starts counting from that number (e.g., 56790, 56791, 56792). Do not assume that the input is a 32 or 64 bit integer, but rather an arbitrary precision integer. Implement the integer using a String (not an array).
  40. Arbitrary precision integer arithmetic. Write a program that takes two decimal strings as inputs, and prints out their sum. Use a string to represent the integer.
  41. Boggle. The game of Boggle is played on a 4-by-4 grid of characters. There are 16 dice, each with 6 letters on the them. Create a 4-by-4 grid, where each die appears in one of the cells at random, and each die displays one of the 6 characters at random.
  42. Generating cryptograms. A cryptogram is obtained by scrambling English text by replacing each letter with another letter. Write a program to generate a random permutation of the 26 letters and use this to map letters. Give example: Don't scramble punctuation or whitespace.
  43. Scrabble. Write a program to determine the longest legal Scrabble word that can be played? To be legal, the word must be in The Official Tournament and Club Wordlist (TWL98), which consists of all 168,083 words between 2 and 15 letters in TWL98. The number of tiles representing each letter are given in the table below. In addition, there are two blanks which can be used to represent any letter.
    a b c d  e f g h i j k l m n o p q r s t u v w x y z -
    9 2 2 4 12 2 3 2 9 1 1 4 2 6 8 2 1 6 4 6 4 2 2 1 2 1 2

  44. Soundex. The soundex algorithm is a method of encoding last names based on the way it sounds rather than the way it is spelled. Names that sound the same (e.g., SMITH and SMYTH) would have the same soundex encoding. The soundex algorithm was originally invented to simplify census taking. It is also used by genealogists to cope with names with alternate spellings and by airline receptionists to avoid embarrassment when later trying to pronounce a customer's name.

    Write a program Soundex.java that reads in two lowercase strings as parameters, computes their soundex, and determines if they are equivalent. The algorithm works as follows:

    1. Keep the first letter of the string, but remove all vowels and the letters 'h', 'w', and 'y'.
    2. Assign digits to the remaining letter using the following rules:
      1:  B, F, P, V
      2:  C, G, J, K, Q, S, X, Z
      3:  D, T
      4:  L
      5:  M, N
      6:  R
    3. If two or more consecutive digits are the same, delete all of the duplicates.
    4. Convert the string to four characters: the first character is the first letter of the original string, the remaining three characters are the first three digits in the string. Pad the string with trailing 0's if there are not enough digits; truncate it if there are too many digits.
  45. Longest word. Given a dictionary of words and a starting word s, find the longest word that can be formed, starting at s, and inserting one letter at a time such that each intermediate word is also in the dictionary. For example, if the starting word is cal, then the following is a sequence of valid words coal, coral, choral, chorale. Reference.
  46. Phone words. Write a program PhoneWords.java that takes a 7 digit string of digits as a command line input, reads in a list of words from standard input (e.g., the dictionary), and prints out all 7-letter words (or 3-letter words followed by 4-letter words) in the dictionary that can be formed using the standard phone rules, e.g., 266-7883 corresponds to compute.
    0:  No corresponding letters
    1:  No corresponding letters
    2:  A B C
    3:  D E F
    4:  G H I
    5:  J K L
    6:  M N O
    7:  P Q R S
    8:  T U V
    9:  W X Y Z
  47. Rot13. Rot13 is a very simple encryption scheme used on some Internet newsgroups to conceal potentially offensive postings. It works by cyclically shifting each lowercase or uppercase letter 13 positions. So, the letter 'a' is replaced by 'n' and the letter 'n' is replaced by 'a'. For example, the string "Encryption" is encoded as "Rapelcgvba." Write a program ROT13.java that reads in a String as a command line parameter and encodes it using Rot13.
  48. Longest Rot13 word. Write a program that reads in a dictionary of words into an array and determines the longest pair of words such that each is the Rot13 of the other, e.g., bumpily and unfiber.
  49. Thue-Morse weave. Recall the Thue-Morse sequence from Exercises in Section 2.3. Write a program ThueMorse.java that reads in a command line input N and plots the N-by-N Thue-Morse weave in turtle graphics. Plot cell (i, j) black if the ith and jth bits in the Thue-Morse string are different. Below are the Thue-Morse patterns for N = 4, 8, and 16.

    4-by-4 Thue-Morse pattern 8-by-8 Thue-Morse pattern 16-by-16 Thue-Morse pattern

    Because of the mesmerizing non-regularity, for large N, your eyes may have a hard time staying focused.

  50. Repetition words. Write a program Repetition.java to read in a list of dictionary words and print out all words for which each letter appears exactly twice, e.g., intestines, antiperspirantes, appeases, arraigning, hotshots, arraigning, teammate, and so forth.
  51. Text twist. Write a program TextTwist.java that reads in a word from the command line and a dictionary of words from standard input, and prints out all words of at least four letters that can be formed by rearranging a subset of the letters in the input word. This forms the core of the game Text Twist. Hint: create a profile of the input word by counting the number of times each of the 26 letters appears. Then, for each dictionary word, create a similar profile and check if each letter appears at least as many times in the input word as in the dictionary word.
  52. Word frequencies. Write a program (or several programs and use piping) that reads in a text file and prints out a list of the words in decreasing order of frequency. Consider breaking it up into 5 pieces and use piping: read in text and print the words one per line in lowercase, sort to bring identical words together, remove duplicates and print count, sort by count.
  53. VIN numbers. A VIN number is a 17-character string that uniquely identifies a motor vehicle. It also encodes the manufacturer and attributes of the vehicle. To guard against accidentally entering an incorrect VIN number, the VIN number incorporates a check digit (the 9th character). Each letter and number is assigned a value between 0 and 9. The check digit is chosen so to be the weighted sum of the values mod 11, using the symbol X if the remainder is 10.
    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    1 2 3 4 5 6 7 8 - 1 2 3 4 5 - 7 - 9 2 3 4 5 6 7 8 9
    1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10  11  12  13  14  15  16  17
     8   7   6   5   4   3   2  10   0   9   8   7   6   5   4   3   2

    For example the check digit of the partial VIN number 1FA-CP45E-?-LF192944 is X because the weighted sum is 373 and 373 mod 11 is 10.

     1   F   A   C   P   4   5   E   X   L   F   1   9   2   9   4   4
     1   6   1   3   7   4   5   5   -   3   6   1   9   2   9   4   4
     8   7   6   5   4   3   2  10   -   9   8   7   6   5   4   3   2
     8  42   6  15  28  12  10  50   -  27  48   7  54  10  36  12   8

    Write a program VIN.java that takes a command line string and determines whether or not it is a valid VIN number. Allow the input to be entered with upper or lower case, and allow dashes to be inserted. Do thorough error checking, e.g., that the string is the right length, that no illegal characters are used (I, O, Q), etc.

  54. Music CDs. Screen-scrape MusicBrainz to identify information about music CDs.
  55. Pig Latin. Pig Latin is a fun secret language for young children. To convert a word to Pig Latin:
    • If it begins with a vowel, append "hay" to the end. At the beginning of a word, treat y as a vowel unless it is followed by a vowel.
    • If it begins with a sequence of consonants, move the consonants to the end, then append "ay". Treat a u following a q as a consonant.

    For example, "input" becomes "input-hay", "standard" becomes "andard-stay", "quit" becomes "it-quay". Write a program PigLatinCoder.java that reads in a sequence of words from standard input and prints them to standard output in Pig Latin. Write a program PigLatinDecoder.java that reads in a sequence of words encoded in Pig Latin from standard input and prints the original words out in.

  56. Rotating drum problem. Applications to pseudo-random number generators, computational biology, coding theory. Consider a rotating drum (draw picture of circle divided into 16 segments, each of one of two types - 0 and 1). We want that any sequence of 4 consecutive segments to uniquely identify the quadrant of the drum. That is, every 4 consecutive segments should represent one of the 16 binary numbers from 0000 to 1111. Is this possible? A de Bruijn sequence of order n is a shortest (circular) string such that every sequence of n bits appears as a substring at least once. For example, 0000111101100101 is a de Bruijn sequence of order 4, and all 2^4 possible 4-bit sequence (0000, 0001, ..., 1111) occur exactly once. Write a program DeBruijn.java that reads in a command line parameter n and prints out an order n de Bruijn sequence. Algorithm: start with n 0's. Append a 1 if the n-tuple that would be formed has not already appeared in the sequence; append a 0 otherwise. Hint: use the methods String.indexOf and String.substring.
  57. Ehrenfecucht-Mycielski sequence. The Ehrenfecucht-Mycielski sequence in a binary sequence that starts with "010". Given the first n bits b0, b1, ..., bn-1, bn is determined by finding the longest suffix bj, bj+1, ..., bn-1 that occurs previously in the sequence (if it occurs multiple times, take the last such occurrence). Then, bn is the opposite of the bit that followed the match. 0100110101110001000011110110010100100111010001100000101101111100. Use substring and lastIndexOf.
  58. Luminance and chrominance. Decompose a picture into its monochrome luminance Y = .299 r + .587 g + .114 b , chrominance I = .596 r -.274 g -.322 b, and chrominance Q = .211 r - .523 g + .312 b. Plot all 3 images.
  59. Flip horizontally. Write a program FlipX.java that takes a command line argument which is the name of a JPG or PNG file, displays it in a window, flips the image horizontally, and displays the resulting image in the window. We illustrate using standard computer graphics test images - baboon.jpg and peppers.jpg.

    peppers      flip peppers
  60. Color separation. Write a program ColorSeparation.java that takes the name of an image file as a command line input, and creates three images, one that contains only the red components, one for green, and one for blue.

    baboon red      baboon green      baboon blue
  61. Saturation. saturation.
  62. Brighten. Write a program Brighter.java that takes a command line argument which is the name of a JPG or PNG file, displays it in a window, and display a second version which is a brighter copy. Use the Color method brighter(), which return a brighter version of the invoking color.

    baboon      brighter baboon
  63. Rotate. Write a program Rotation.java that rotates the image 30 degrees counterclockwise. reference. We check if the pre-image is out-of-bounds.
    theta = pi / 6 radians
    ii = -(i - i0)*cos(theta) + (j - j0)*sin(theta) + i0
    jj =  (i - i0)*sin(theta) + (j - j0)*cos(theta) + j0
    Here (i0, j0) is the center of the image and theta is the angle of rotation (counterclockwise). Important idea = sampling.

    baboon      rotated baboon
  64. Swirl. Write a program Swirl.java that applies a swirl filter. Similar to rotation filter, except that angle changes as a function of distance to the center.
    r = sqrt((i - i0)^2 + (j - j0)^2)
    theta = r * pi / 256
    ii = -(i - i0)*cos(theta) + (j - j0)*sin(theta) + i0
    jj =  (i - i0)*sin(theta) + (j - j0)*cos(theta) + j0

    baboon      baboon swirl
  65. Wave. Write a program Wave.java that produces a wave effect.
    xx = x
    yy = y + 20*sin(2*pi*x/64)

    wavy baboon      wavy baboon
  66. Glass filter. Write a program Glass.java that takes a command line argument which is the name of a JPG or PNG file, displays it in a window, applies a "glass filter", and displays the resulting image in the window. This effect is achieved by setting pixel (i, j) to be the color of a random neighboring pixel (within Manhattan distance 5). This has the effect of blurring the images as well.

    baboon      baboon through glass
  67. Slide show. Write a program to display a sequence of images in a slide show (one every two seconds), using a fade effect (or some other transition effect) between pictures.
  68. Tile. Program Tile.java takes a command line argument which is the name of a JPG or PNG file, two command line integers M and N, and creates an M-by-N tiling of the picture. Below is a 2-by-3 tiling of the baboon picture.

    2-by-3 baboon

  69. Edge detection. Goal: form mathematical model of some feature of the image. To accomplish this, we want to detect edges or lines. An edge is a area of a picture with a strong contrast in intensity from one pixel to the next. Edge detection is a fundamental problem in image processing and computer vision. The Sobel method is a popular edge detection technique. We assume that the image is grayscale. (If not, we can convert by taking the average of the red, green, and blue intensities.) For each pixel (i, j) we calculate the edge strength by computing two 3-by-3 convolution masks. This involves taking the grayscale values of the nine pixels in the 3-by-3 neighborhood centered on (i, j), multiplying them by the corresponding weight in the 3-by-3 mask, and summing up the products.
    -1 0 +1      +1 +2 +1
    -2 0 +2       0  0  0
    -1 0 +1      -1 -2 -1
    This produces two values Gx and Gy. In the output picture, we color the pixel (i, j) according to the grayscale value 255 - Sqrt(Gx*Gx + Gy*Gy). There are various ways to handle the boundary. For simplicity, we ignore this special case and color the boundary pixels black. Program EdgeDetector.java takes the name of an image as a command line input and applies the Sobel edge detection algorithm to that image.

    baboon      Sobel baboon

Web exercises

  1. Painter's and printer's color triangles. Create the following two images. The primary hues of the painter's triangle are red, green, and blue; the primary hues of the printer's triangle are magenta, cyan, and yellow.
                   Painter's triangle                Printer's triangle
  2. Entropy. The Shannon entropy measures the information content of an input string and plays a cornerstone role in information theory and data compression. It was proposed by Claude Shannon in 1948, borrowing upon the concept in statistical thermodynamics. Assuming each character i appears with probability pi, the entropy is defined to be H = - sum pi log2 pi, where the contribution is 0 if pi = 0. Compute entropy of DNA sequence.
    1. Write a program to read in a ASCII text string from standard input, count the number of times each ASCII character occurs, and compute the entropy, assuming each character appears with the given probabilities.
    2. Repeat part (a) but use Unicode.
  3. wget. Write a program Wget.java that takes the name of a URL as a command-line argument and saves the referenced file using the same filename.
  4. Capitalize. Write a program Capitalize.java that reads in text from standard input and capitalizes each word (make first letter uppercase and make the remaining letters lowercase).
  5. Shannon's entropy experiment. Recreate Shannon's experiment on the entropy of the English language by listing a number of letters in a sentence and prompting the user for the next symbol. Shannon concluded that there is approximately 1.1 bits of info per letter in the alphabet.
  6. Scrambled text. Some cognitive psychologists believe that people recognize words based on their shape.
    to a rscheearch at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht frist and lsat ltteer is at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae we do not raed ervey lteter by itslef but the wrod as a wlohe.
    Write a program that reads in text from standard input and prints the text back out, but shuffles the internal letters in each word. Write and use a function scramble() that takes as input a string and returns another string with the internal letters in random order. Use the shuffling algorithm in Shuffle.java for the shuffling part.
  7. Date format conversion. Write a program to read in a data of the form 2003-05-25 and convert it to 5/25/03.
  8. interesting English words
  9. Two-stroke apparent motion. Create the optical illusion of two-stroke apparent motion or four-stroke
  10. De Valois' checkerboard. Create the optical illusion of De Valois' checkerboard or one of the other optical illusions from the Shapiro Perception Lab.

More stuff.

Here's a FAQ for manipulating pixels in Java. Here's a list of more possible transformations for monochrome images [pdf].

Miasma animation using MemoryImageSource.