Programming Assignment 5 (Due on March 13)

In this assignment you will write the final pass of our Decaf compiler - it traverses the syntax tree and emits Jasmin code.

Files

For this assignment, you will be downloading a compressed "tar" (tape-archive) file which contains all the files you will need. Keep all the files in a single directory.

pa5.tar.gz

Once you've downloaded the compressed tar file, execute the following commands:

% gunzip pa5.tar.gz
% tar xf pa5.tar

This will create a directory called pa5 and will put in it all the files you need. You can then remove the pa5.tar file.

The Problem

This is the final stage of our decaf compiler. In the last assignment you added a check() method to each kind of Decaf node. In this one you will add a code() method. That method should print out Jasmin code that perform whatever is appropriate for that kind of node. For example, an arithmetic expression pushes its left and right sub-expressions onto the runtime stack, and then performs the appropriate operation via an opcode like iadd or imul. An assignment statement pushes its right-hand-side onto the stack, then stores it into some local variable via the opcode istore.

The code() methods will often be pretty short, recursively calling the code() methods of their own members, and then adding a bit more code of their own. This is very similar to how things were done in the checking phase. What matters is that you understand exactly what sort of Jasmin instructions you need to use to express the semantics you have in mind for a given Decaf instruction. The next sections will give you some tips on what to do.

Getting Started

Download the tar file and un-tar it as described above. Now execute the following instructions:

% javac *.java
% java Main Empty.nocaf

You should get an output like this:

.source Empty.nocaf
.class Empty
.super java/lang/Object

.method static nothing()I
  .limit stack 16  ; (Hack!)
  .limit locals 0
  iconst_0
  ireturn
.end method

That's the Jasmin source for the Empty.nocaf program. Now we will convert it into a real Java classfile. Execute the following instructions:

% java Main Empty.nocaf > Empty.jasmin
% setenv CLASSPATH ".:/fs/cs-cls/cs160/lib/jasmin"
% java jasmin.Main Empty.jasmin

Jasmin assembler converted the JVM assembly code in Empty.jasmin to JVM bytecode and generated a class file. Now you have a Java classfile named Empty.class. How to use it? Since Decaf doesn't support the main() method, we can't see the results of executing compiled Decaf code unless we call it from regular Java code. Open the file Caller.java and uncomment the one line of code there. Now execute the following instructions:

% javac Caller.java
% java Caller

This will call the nothing() method of the Empty class and print the result. This is how you will call methods of Decaf classes you write.

What Next?

How does this all happen? The code in Main.java has just two new lines:

Scanner   S  = new Scanner(args[0]);
Lexer     L  = new Lexer(S);
Parser    P  = new Parser(L);
ClassDecl CD = P.parseClassDecl();      
CD.check();
      
Coder.filename = args[0];  // This is new.
CD.code();                 // This is new.

So, as with the checking phase, you have to follow the action down through the code() method of ClassDecl and see what it does. Aside from some administrative stuff, it calls the code() methods of various other things. Continuing this way you will eventually look at each node class and fill in its code() methods.

The JVM + Jasmin virtual machine is a very straightforward low-level environment. It shouldn't be too hard to figure out what code to emit for most sorts of statements and declarations. The tricky cases will be equality expressions, if statements and while statements. Some basic things to keep in mind when writing your coding methods:

Each node class already has a code() method defined, but many of them need finishing. In every method body are comments telling you what you need to do there.
The list-storing nodes (FormalParams, MethodDecls, etc.) have already implemented the code() method to just pass on the call to each of the elements in their list. You don't need to add anything.
The JVM uses a stack-based model for evaluating expressions. The section JVM Opcodes in this document gives a summary of all the JVM instructions you will need to finish your compiler.
Local variables and jump labels are referred to by numbers. The class Coder (described below) keeps track of these and provides some accessors for you.
There is a classfile-to-Jasmin decompiler under the /fs/cs-cls/cs160/lib directory called D-Java. To get some clues as to how something might best be expressed in Jasmin, you can write a Java file, compile it with javac, and then decompile it to Jasmin and take a look.
Some of the descendants of Expression have a new field called decl in them. This field is set by the checking code to refer back to the node in the syntax tree where the type of the expression was declared. You'll need this info when emitting code. See the section Interfaces below for more about this.
This is a bit reminiscent of how we needed to retrieve the declaration of a method when we were checking a call expression. We did that by using the symbol table. But after the checking phase the table is empty. So while we go through the checking, we attach to expressions this link back to where they were declared.
The class Statement defines a method called numLocals(), which returns the value zero. This method is supposed to return the number of local variables declared in a statement. This has meaning for a local variable declaration itself, and for a block. You will need to override the method for those classes to return the correct number.
The .limit stack directive is hardwired in MethodDecl.code() to use the value 16. This is definitely a hack for the sample code. Another limit you need to set is the local variable space. Your final code generator should analyze the code for each method, predict how much stack and local variable space is needed, and set up proper limits.

Boolean type

JVM does not have a boolean type. So we will encode the boolean values true and false as integers. We will use 1 to encode true and 0 to encode false. If the return type of a method is boolean, it is OK if the code you generate returns 0 when the result is false and 1 when the result is true.

Helper classes

There is just one helper class provided here; following the pattern in the previous assignment it's called Coder. This class keeps a few global values that are inconvenient to store anywhere else. In particular it keeps track of the local variable number and the label number.

The JVM can support up to 65,535 local variables in a given method. It refers to them by number. As far as your perspective is concerned, this is like having 65,535 different registers. How do we connect the variable with its number? Each variable is declared exactly once, so we attach the variable number to its declaration. This is done in the code() method for the declaration. In that method, it asks the Coder what the next unused variable number is, and uses that. A lengthier discussion of how this local variable number is stored with the declaration is given below in the section Interfaces.

In Jasmin you can use any string to denote a jump label. The class Coder provides a method that makes it easier to create jump labels that are unique, by returning the next unused jump number. You can create a label by just concatenating that number to the string "Label" (or whatever other string you want).

JVM Opcodes

These are the JVM opcodes you will need to use to write your Decaf compiler. Longer descriptions of these can be found at http://www.mrl.nyu.edu/meyer/jvmref/.

Operations

iadd - Pop top two elements, push sum.
isub - Difference.
imul - Product.
idiv - Integer quotient.
irem - Remainder (mod).
iand - Bitwise and.
ior - Bitwise or.

But ... no ieq or ine! You'll have to use if_cmpne and if_cmpeq to get the same effect.

Stack manipulation

ldc # - Push single-word constant.
iload # - Push local variable #.
istore # - Pop top element, store in local #.

The previous three instructions take two bytes. If the operands of the instructions are small numbers, there are these special-case opcodes that take only one byte.

iconst_# - Push 0, 1, 2, 3, 4, or 5.
iload_# - Push local variable 0,1,2,3.
istore_# - Pop top element, store in local 0,1,2,3.

Control flow

goto <label> - Jump to <label>.
ifeq <label> - Pop top element, jump to <label> if it == 0.
ifne <label> - Pop top element, jump to <label> if it != 0.
if_icmpne <label> - Pop top two elements, jump to <label> if they are !=.
if_icmpeq <label> - Pop top two elements, jump to <label> if they are ==.
invokestatic <method> - Call static method.
ireturn - Return, top element is integer.

Interfaces

An interesting design problem arises, due to the similiarity of formal parameter declarations and local variable declarations. In fact, they're practically identical - each of them declares the type and name of a variable that is local to a method's scope. But they have different roles in the formal syntax of the language, and the inheritance hierarchy of their Node types reflects this:

Node
  Listable
    FormalParam
    ... etc. ...
    Statement
      LocalVarDecl
      ... etc. ...

It's clear that FormalParam is a kind of Listable, and that LocalVarDecl is a kind of Statement, so we don't want to give up that. But...

Every local variable in a method has its own unique number. The logical place to store this number is in the declaration, since there's exactly one of those, and possibly many uses of the variable in expressions. OK, so we add a field called localNum to both the classes LocalVarDecl and FormalParam. Up to now we've had to use these two classes in different contexts, so the fact that they had identical structures that had to be written in parallel instead of shared was just an inconvenience. But as we'll see below, things become considerably more annoying now. Interfaces offer us a way out.

Next we add to the expression node Id a reference that points back to its declaration, and set that reference during the checking pass. Now, how do we declare that? At this stage, the best we could do is this:

class Id extends Expression
  {
  public String spelling;
  public Listable decl;     // A LocalVarDecl or FormalParam.
  ...
  }

So what's the problem? Well, when we get to the code() method, we need to do something like this:

public void code()
  {
  System.out.print("  iload ");
  if (decl instanceof LocalVarDecl)
    System.out.println(((LocalVarDecl)decl).localNum);
  else if (decl instanceof FormalParam)
      System.out.println(((FormalParam)decl).localNum);
  else
    throw some sort of unexpected type exception;
  }

Well, this sort of thing is what OO is supposed to help us avoid, isn't it? For various good reasons, Java lets a class inherit (extend) from only one parent class. Inheritance means you get interfaces and implementations from your parent. Not just the family name, but the money too. But classes can get just the interface only, from an interface class, and can do that for any number of interfaces. What we do here is this:

interface LocalDecl { int localNum(); }

class FormalParam extends Listable implements LocalDecl
  {
  private int localNum;
  public  int localNum() { return localNum; }
  ...
  }

class LocalVarDecl extends Statement implements LocalDecl
  {
  // similar
  }

So we've created a sort of new superclass, just for FormalParam and LocalVarDecl, which specifies that they have to provide a certain interface, namely a method called localNum(). What they actually do with this method is their business (which is to say, your business when you write the code). In fact they do the obvious thing.

Now we can be more expressive in declaring to the Java compiler what sort of field this is that we're adding to Id:

class Id extends Expression
  {
  public String spelling;
  public LocalDecl decl;     // Has a localNum() method.
  ...
  }

And this in turn lets us be more succinct in the Id.code() method, while still letting the Java compiler be sure it's type-safe:

System.out.println("  iload " + decl.localNum());