Programming Assignment 3 (Due on Feb 18)

In this assignment you will build a parser - a program that reads a text file and constructs a syntax tree. You will write a top-down parser from scratch and use a program that writes a bottom-up parser for you based on a context-free grammar you provide.

Files

As with the previous assignment, there is a directory hierarchy to the files; please preserve this hierarchy when you download and work with the files.

Problem 1

The directory p1/ contains the framework for a top-down (recursive descent) parser for a Nocaf program. Your task is to extend this program to parse all of the Decaf grammar. The two grammars are given below in a notation called EBNF (Extended Backus-Naur Form). The brackets [ ] are used for optional constructs - that is [ X ] means ( X | <empty> ) in standard notation. Parentheses are used for grouping, and the * is the usual Kleene star.

Nocaf grammar

ClassDecl -> class Id "{" MethodDecl* "}" <EOF>

MethodDecl -> static Type Id "(" FormalParams ")" Block

Type -> int

FormalParams -> <empty>

Block -> "{" Statement* "}"

Statement -> <empty>

Decaf grammar

High-level program structures

ClassDecl -> class Id "{" MethodDecl* "}" <EOF>

MethodDecl -> static Type Id "(" FormalParams ")" Block

Type -> int  
      | boolean
(No void methods because we have no mechanism for side-effects.)
FormalParams -> [FormalParam ("," FormalParam)*]

FormalParam -> Type Id

Statements

Block -> "{" Statement* "}"

Statement -> Block
           | LocalVarDecl  // LL(2) to distinguish from AssignStmt
           | AssignStmt    // if non-reserved type names are allowed.
           | IfStmt
           | WhileStmt
           | ReturnStmt
           
LocalVarDecl -> Type Id ";"

AssignStmt -> Id "=" Expression ";"

IfStmt -> if "(" Expression ")" Statement [else Statement]
(Your parser must disambiguate the dangling else!)
WhileStmt -> while "(" Expression ")" Statement

ReturnStmt -> return Expression ";"

Expressions

Expression -> ConditionalAndExpr ("||" ConditionalAndExpr)*

ConditionalAndExpr -> EqualityExpr ("&&" EqualityExpr)*

EqualityExpr -> AdditiveExpr (("=="|"!=") AdditiveExpr)*

AdditiveExpr -> MultiplicativeExpr (("+"|"-") MultiplicativeExpr)*

MultiplicativeExpr -> PrimaryExpr (("*"|"/"|"%") PrimaryExpr)*
(Note no UnaryExpr -- we don't have unary minus.)
PrimaryExpr -> Num  // Integer literal
             | Bool    // Boolean literal
             | Id
             | CallExpr   // LL(2) to distinguish from plain Identifier.
             | "(" Expression ")"

CallExpr -> Id "(" ActualParams ")"

ActualParams -> [Expression ("," Expression)*]

Getting Started

Download the files and issue the following commands at the Unix prompt:

You should get the following output:

The program has digested the input Nocaf file, built a syntax tree, and printed out its structure. Most of the architecture you will need to finish the parser is already present in the Nocaf parser in Parser.java. You will have to understand the organization of the parser and reuse some of its ideas. What follows is a general overview of your starting-point code.

Lexer.java

This file contains a fully-working lexer for Decaf - that is, it's a correct solution to the previous homework. You won't have to change the code in this class. It does have some added features:

Nodes.java

This file contains a full set of classes to describe nodes in a Decaf syntax tree. These are similar to the ones you used in Problem 1 of the first assignment. You also won't have to change the code in this class. Some added features are:

Main.java

This is where it all starts. The main() method creates a Scanner and Lexer, as in the previous assignment. Then it creates a Parser, and gets from it a node that represents the text it parses. You might want to uncomment some of the commented-out code in this file during debugging.

Parser.java

This is where all the action is. At the bottom of the file you see the method parseClassDecl() that gets called first.

The code for the parser uses dependency-ordering, in which (as far as possible) code at a given position in the file depends only on code that precedes it. This means the high-level routines occur at the bottom, in the reverse order from the way the structures are presented in the grammar. If this drives you crazy, go ahead and reorder the grammar or the code.

From the Decaf syntax you see that a class declaration consists of the class keyword, a name, a left brace, a sequence of method declarations, a right brace, and an end-of-file. The code in the method parseClassDecl() reflects that structure quite closely: In fact, it's quite easy to translate from a grammar to a parser - in the next problem we will automate this.

The code provided also has an example of how to parse sequences and put them into a list node. You have to add the code to parse statements and expressions in Decaf. The expressions may be the most difficult.

There are also lines, commented out in main(), that create and use a Recognizer instead of a Parser. This is just there to help you proceed in manageable steps. The recognizer code doesn't build a syntax tree so the flow of its control is easier to follow. It's not a requirement, but you might find it useful to first write the recognizer for Decaf, and then add to it the tree-building code.


Problem 2

The next problem is to create a parser for the same language, but using the automated tool JavaCUP and the LR grammar provided below. The source code (in Java) and documentation for this tool is available at http://www.cs.princeton.edu/~appel/modern/java/CUP/. The CS-160 home directory has a pre-compiled version of this tool.

Decaf LALR grammar

High-level program structures

ClassDecl ::= class Id "{" MethodDecls "}" 

MethodDecls ::= MethodDecls MethodDecl
             |  /* empty */

MethodDecl ::= static Type Id "(" FormalParams ")" Block

FormalParams ::= ProperFormalParams
              | /* empty */ 

ProperFormalParams ::= ProperFormalParams "," FormalParam
                    |  FormalParam
   
FormalParam ::= Type Id
      
Type ::= int
      |  boolean

Statements

Block ::= "{" Statements "}"

Statements ::= Statements Statement
            |  /* empty */

Statement ::= Block
           |  LocalVarDecl
           |  AssignStmt
           |  IfStmt
           |  WhileStmt
           |  ReturnStmt

LocalVarDecl ::= Type Id ";"

AssignStmt ::= Id "=" Expression ";"

IfStmt ::= if "(" Expression ")" Statement
        |  if "(" Expression ")" Statement else Statement
(Your parser must disambiguate the dangling else!)
WhileStmt ::= while "(" Expression ")" Statement

ReturnStmt ::= return Expression ";"                 

Expressions

Expression ::= ConditionalAndExpr
            |  Expression "||" ConditionalAndExpr

ConditionalAndExpr ::= EqualityExpr
                    |  ConditionalAndExpr "&&" EqualityExpr

EqualityExpr ::= AdditiveExpr
              |  EqualityExpr "==" AdditiveExpr
              |  EqualityExpr "!=" AdditiveExpr

AdditiveExpr ::= MultiplicativeExpr
              |  AdditiveExpr "+" MultiplicativeExpr
              |  AdditiveExpr "-" MultiplicativeExpr

MultiplicativeExpr ::= PrimaryExpr
                    |  MultiplicativeExpr "*" PrimaryExpr
                    |  MultiplicativeExpr "/" PrimaryExpr
                    |  MultiplicativeExpr "%" PrimaryExpr

PrimaryExpr ::= Num
             |  Bool
             |  Id
             |  CallExpr
             |  "(" Expression ")"

CallExpr ::= Id "(" ActualParams ")"

ActualParams ::= ProperActualParams
              | /* empty */ 

ProperActualParams ::= ProperActualParams "," Expression
                    |  Expression

Getting Started

Download the files and issue the following commands at the Unix prompt:

You should get the same syntax tree output you did at the beginning of Problem 1. The order you do these things in is important. Here's what's being done:

The documentation for JavaCUP is pretty good; here's a brief guide to what you will find already set up.

P2.cup

This file starts with a few directives containing Java code that is to be incorporated verbatim into the generated parser. We tell it to add a private member of type Lexer to its parser, and to call that lexer's next() method when it wants a new token. The parser itself doesn't want to know about the lexer's token type; it wants an instance of its own type Symbol. You have to do some poking around in the JavaCUP source to find out what this is, but it's a simple structure with integer fields for type and range, and a generic object field which we use for the spellings:

So we create a new instance of this class and pass into its constructor the fields from our token. You won't have to change any of this code.

Next you tell the parser-generator what kinds of terminal symbols you want. These names are put into the file Sym.java as constants, and your lexer will use them. You also won't have to change these - they already specify the full Decaf lexicon.

Next comes the list of non-terminals, that is, left-hand sides of productions from the Decaf grammar. You will certainly have to add to this list. Notice that listing them here is just to make the JavaCUP's own parser simpler - it could instead make an extra pass to gather this information.

And finally, the grammar itself, put in proper LR CFG form. With each production are associated some variables and some Java code. For example, look at the highest-level production

We describe the syntax of the class declaration, and give names to some of its members. Those names represent results of things already parsed. For example, elsewhere in the grammar there is a production rule for MethodDecls which assigns a MethodDecls node to the variable RESULT. The variable mds gets that value when we get to the production shown here. The variable name represents the String containing the name of the class; it's a String because of the directive earlier in the file. The compiler created by JavaCUP knows this, and you can use the variable name directly. JavaCUP does not know that mds is an instance of MethodDecls so you have to cast it before using it.

The value we assign to the RESULT value (predefined by JavaCUP) gets assigned to the non-terminal that appears on the left-hand side of the production (in this case, ClassDecl). It will be available in other production rules in which ClassDecl appears.

The grammar provided gives examples of how to parse simple structures, and how to create lists. You have to extend it to parse all of Decaf. You will be adding more productions to P2.cup and recompiling using the following steps:

The syntax error messages provided by JavaCUP don't have line numbers, only character numbers counting from the start. If you don't use a text editor that can locate text by its character number, it could be very inconvenient to track down the offending text. A utility program called findit is provided for you. Here is an example of using it to find what the problem is at character 273 of a file called test.decaf:

% findit test.decaf 273
Line 22: static ant f()
                ^