8.2 Compilers, Interpreters, and Emulators


This section under major construction.

Compilers and interpreters.

A compiler is a program that reads in as input a program (in some high-level programming language) and outputs machine language code (for some machine architecture). The machine language code can subsequently be executed any number of times using different input data each time. As an example, the Unix program g++ transforms a C++ source file into a machine executable file a.out which can be run natively on Sparc microprocessors. As a second example, the Java compiler javac transforms a .java source file into a .class file that is written in Java bytecode, which is the machine language for an imaginary machine known as the Java Virtual Machine.

An interpreter is a program that reads in as input a source program, along with data for the program, and translates the source program instruction by instruction. For example, the Java interpreter java translate a .class file into code that can be executed natively on the underlying machine. As a second example, the program VirtualPC interprets programs written for the Intel Pentium architecture (IBM-PC clone) for the PowerPC architecture (Macintosh). This enable Macintosh users to run Windows programs on their computer.

Why does Java typically interpret instead of compile? The main advantage of compilation is that you end up with raw machine language code that can be efficiently executed on your machine. However, it can only be executed on one type of machine architecture (Intel Pentium, PowerPC). A primary advantage of a compiling to an intermediate language like Java bytecode and then interpreting is that you can achieve platform independence: you can interpret the same .class file on differently types of machine architectures. However, interpreting the bytecode is typically slower than executing pre-compiled machine language code. A second advantage of using the Java bytecode is that it acts as a buffer between your computer and the program. This enables you to download an untrusted program from the Internet and execute it on your machine with some assurances. Since you are running the Java interpreter (and not raw machine language code), you are protected by a layer of security which guards against malicious programs. It is the combination of Java and the Java bytecode that yield a platform-independent and secure environment, while still embracing a full set of modern programming abstractions.

The Java bytecode and the java interpreter are not inherently specific to the Java programming language. For example, you can use Jython to compile from the Python programming language into Java bytecode, and then use java to interpret it. There are similar ML, Lisp, and Fortran compilers that compile into JAva bytecode. You could also use the Unix program gcj to compile directly from a .java source file into a machine executable file a.out, which can be run natively on any Sparc microprocessors. Additionally, you can design hardware whose machine language is the Java bytecode. Sun Microsystems has done exactly this, making the Java Virtual Machine not so virtual.

Why not use a real machine language instead of the Java bytecode? The Java bytecode is much simpler than a typical high-level programming language. It is much easier to write a Java bytecode interpreter for a new type of computer than it is to write a full Java compiler.

Creative Exercises

  1. Boolean expression evaluation. Write a program to evaluate Boolean expressions made of bits, unary operators (~), binary operators (&, ^, |), and parentheses using the same precedence rules as Java. Here is a solution using two stacks, one to store the bits and one to store the operators.