Project 2

Top-Down Parser

The purpose of this project is to learn about top-down and recursive descent parsers. You will build a recursive descent parser for an example expression language describing integer arithmetic.

Specification and Description Changelog

A list of all the changes made to this project description is below:

  • Some part of code is given for ease of realization and understanding of Scanner
  • The syntax has changed a little

Files

You can find the provided code files required the project in p2.rar. Files or parts of code marked with a "do NOT modify" must not be modified at all. Otherwise, implement the solution as you see fit.

Description

This project will consist of implementing a scanner/lexer and a top-down recursive descent parser for the expression language with syntax specified by thecontext-free grammar (CFG) in syntax.md.It is also in p2.rar. Open it with a Markdown Reader.

where italicized symbols are non-terminal, and bold symbols are terminal (lexemes).

Whitespace and EOL (end of line) is ignored in the language, but may not appear inside a single lexeme. So for example: 5 + 2 mod 3; 5 - 4 and 2+3+4 and 2 + '\n' 4 are valid, but 2 3 + 4 and 2 '\n' 3 + 4 are not.

The language semantics are given by the following rules:

  • A valid input consists of a sequence of expressions.
  • Expressions consist of normal arithmetic expressions as well as parenthesized expressions, and literal integers.
  • The operators are: (1) addition, notated by the + lexeme; (2) subtraction, notated by the - lexeme; (3) multiplication, notated by the * lexeme; (4) division, notated by the / lexeme; (5) ternary operator notated by the ? : lexeme.
  • Operator precedence is the same as standard mathematical precedence like that in C/C++: parenthesized expressions are highest, multiplication and division are next and the same, addition and subtraction are lower and the same, ternary operator is the lowest.
  • Operator associativity is also standard like that it in C/C++, all operators are left-associative.
  • The integer literals are notated by Integer terminal.

To complete the project, you need to complete the implementation of a scanner and a top-down parser for the above language. A skeleton of each is provided in the calculator.cpp file and the appropriate headers and helpers, and you will need to fill in any parts marked with a // WRITEME comment.

Project Steps

The task is be broken into three steps:

  1. Modify grammar to be unambiguous, LL(1), and handle precedence.
  2. Implement Tokens that will be used in nextToken, eatToken. Implementing scanner functions nextToken, eatToken.
  3. Implement all necessary recursive-descent functions for parser.
  4. Pass values between recursive-descent functionst to implement the evaluation of input programs.

Step 1: Modify Grammar

To allow a recursive descent parser to be written for the grammar, it must be modified to be unambiguous and LL(1). You should also handle precedence levels of the various binary operators in the language. To do this, you will need to use the left factoring and left recursion elimination techniques that will be discussed in lecture.

Once you have completed the modifications, make sure that the new grammar accepts the same language as the original ambiguous grammar. Also make sure that every grammar rule choice can be determined by a single lookahead token (necessary for non-backtracking recursive descent), which you can do by checking that every production with the same left hand side has a FIRST set distinct from other production with that left hand side.

Step 2: Implement Scanner

There are two functions that you must implement in the scanner class: nextToken, eatToken. The nextToken function returns the next token in the input stream, but does not consume the token. The eatToken function takes in a token that the parser expects, and verifies that it is the same as the next token available before consuming it and removing it from the input stream.

A few things to keep in mind while implementing the above functions are: multiple subsequent calls to nextToken without a call to eatToken should all return the same token and should not advance the input stream, and reading too far ahead in the stream can be dangerous depending on how you handle line numbers. Line numbers should always reflect the current token being processed, and should never advance farther until the token is consumed. The simplest way to ensure correct handling of line numbers is to not scan ahead at all in the input/token stream. The scanner functions should make appropriate use of the scanError, outOfBoundsError and mismatchError functions.

Remember to type make to build your solution. The executable will be called calculator and can be run with -s flag to test the token stream coming from the scanner.

Step 3: Implement Recursive-Descent Parser

To implement the parser, you will need to create all necessary recursive-descent functions for non-terminal symbols in the grammar. A Start function has been provided, and should be used as the entry point into your grammar. When writing the parser, you should keep in mind that non-terminal symbols on the right-hand side of a production correspond to function calls (the recursive part of recursive-descent) and terminal symbols correspond to eatToken calls into the scanner.

Inside recursive-descent functions with multiple choices for the production to apply, you may want to use a switch-case statement. An example of this can be found in the provided Start function. If you find that you cannot determine which production to apply with a switch on the next token, your grammar may not be fully LL(1) and may require further modification.

Step 4: Implement Evaluation of Programs

The last step is to augment the parser so that it will evaluate expressions correctly and print the expression values in the order they are listed in the input, one expression value per line (i.e., separate printed expression values with newline). You must also evaluate expressions with the correct precedence and associativity of operators, as well as throw an outOfBoundsError for numbers greater than signed INT_MAX, or an divideByZeroError for numbers greater than signed INT_MAX

Be careful to handle left associativity correctly. As the grammar will be LL(1), it will not have the correct associativity for left associative operators in the parse tree. You may need to handle this more creatively.

If you complete this part your parser should only evaluate and print expressions or throw divideByZeroError when the -e flag is sent to the program, (outOfBoundsError may occur when an input integer is too large, which means it can appear out the evaluate mode), which can be checked by testing the evaluate member of the Parser (you don't have to provide the command line flag checking, it is included in the provided code). Failure to check this flag this will likely result in a lower or zero score for the project, including previous steps. The idea is that the evaluation part should only be "enabled" when the evaluate member of the Parser is set to true.

Requirements

To obtain full credit for this project, your solution will need to:

  • Parse all valid programs in the language.
  • Reject all invalid programs, and print an error to STDERR.
  • Not cause any segmentation faults or other errors.
  • Use a grammar with correct precedence of operators.
  • Print correct values of expressions (with -e flag).
  • Use correct associativity and precedence of operations when evaluating.
  • Throw out-of-bounds errors for numbers greater than signed INT_MAX (with -e flag).

It is highly recommended to verify your solution output for the provided test cases against the expected output. More information about how to do this can be found in the Grading section.

Make sure that you compile, run, and test your program on the CSIL server. Especially if you write your program on your own machine.

Deliverables

You should submit the files calculator.hpp and calculator.cpp with a complete implementation to the GradeScope.

There are some parts given in calculator.hpp and calculator.cpp. However just feel free to modify them. In fact, you can just write your own calculator.hpp and calculator.cpp without our code, only if it can complete the task and pass the test. However, don't modify any other file.

You also must include a README file which includes your name, perm number, email address, and any issues with your solution and explanations.

You do not need to submit the test cases, makefile, or the unmodified source files. Feel free to write and share your complex test cases with your friends! To do so, we supplied you with a simple DSL for testing in the test/test.rb file in which you can define different cases and see how these cases should behave, and how they actually do behave. If you meet any difficulties along the way, feel free to ask for our help.

Grading

Your grade will be based on the proportion of test cases for which your program produces correct output. We have provided some test cases (as well as the expected output for all tests cases). During grading we will run more and strict test cases, so you should generate extra test cases yourselves to make sure that you handle all cases.

To run your solution against the tests cases, you can type make test. Doing this will output the test cases that have passed or failed, along with any other useful information. Green means the test passed.