CS160: Project 1 - A simple calculator (10% of project score)

Project Goals

The goals of this project are:

to learn how to write a simple parser
to develop a simple calculator

Administrative Information

This is an individual project.

The project is due on Tuesday, October 24, 2023, 23:59:59 PST. There will be no deadline extensions.

Project Introduction

In this assignment, we are developing a simple calculator. The goal of this project is to demystify all of the front-end stuff that a compiler does. Everything you need is right in calc.cpp - one file makes all. The other stuff is just to help you develop and test your calculator. We have not used any complex tools to do things automatically, everything is in very vanilla C++ and the code should be fully commented and pretty readable. You _need_ to know what everything in calc.cpp does before you can start writing code. There should be a "WRITEME" everyplace in the code that is in need of your assistance. However, the version that we provide to start will compile and even run (although it does not actually scan a file, and the grammar it parses is trivial). The grammar we want our our simple calculator to recognize is as follows:

      List    -> List Expr T_period
              | Expr T_period 

      Expr    -> Expr T_times Expr
              | Expr T_plus Expr
              | Expr T_minus Expr
	      | T_minus Expr
              | T_num
              | T_openparen Expr T_closeparen
              | T_bar Expr T_bar

While the grammar above is ambiguous (in the technical sense) in many many ways, it will recognize any program written in our calc language (in other words, it should be able to tell if it is syntactically correct or not). You will need to change the grammar around so that it recognizes the same language but is no longer ambiguous. Furthermore, it should be LL(1) so we can parse it with recursive decent. Of course, your parser must also correctly handle associativity and precedence. Especially for associativity, this can be a little tricky.

Tour of the Code

The tarball with a skeleton of calc.cpp (and some additional files) can be downloaded here. You will see that the code in calc.cpp is divided into four major parts and three classes. The first part contains some enums and helper functions to aid in dealing with tokens, non-terminals, and printing out all that stuff. Once you figure out the grammar that you are going to use, it should be pretty straightforward to add the other non-terminals into the appropriate enum and helper functions.

The second chunk of code is the scanner (which is the C++ class scanner_t). The scanner should handle reading the input from stdin and identifying the appropriate tokens. The interface listed should be supported because that is how the recursive decent parser will actually be getting tokens. It does not have to be a big state machine or a regular expression ... just something that is coded by you and that works. Make sure you test your scanner well before you move on. The last thing you want to be doing is trying to track down a weird scanner problem in your parser. In addition to finding the tokens, you should also keep track of newlines so that you can find any syntax errors your parser says it finds (it does this by calling get_line(), which should return the number of the line of input that is now being scanned). The code that is in there to start is just stub code so that everything compiles and you actually get some visible output (the scanner is just returning either a "+" or "eof" randomly). For the basic part of the assignment, you do not need to handle any attributes (such as the actual value of a number token).

The third chunk of code is the class that draws a parse tree (called parsetree_t). The class parsetree_t need not and should not be modified. All it does is print out the parse tree as you discover it. It prints the tree out in a format readable by the program "dot" (from the graphviz package), which can turn it to a PDF file (the makefile shows how to do that). The output is really a bunch of nodes and edges. When you start processing a non-terminal, you push it on a stack (this will draw an edge from that newly pushed node to the current node on the top of the stack, which is its parent). When you finish, just pop it. The parse tree that you generate should be super helpful for debugging purposes.

The fourth chunk is the parser itself (class parser_t). There are already some helper methods provided for you, but you need to figure out how to structure your grammar such that it can be written as a recursive decent parser (more on that later). The code that is there now will parse the grammar "List -> '+' List | EOF". If you run parse(), it will call List(), which recursively calls itself. As List() is executed, it calls the scanner to get new tokens, and it calls parsetree_t to actually print out the parse tree.

Steps to Solve the Challenge

Get the scanner working:
Implement the scanner class and pass in some of the test inputs (call make test_parse). You will need more test inputs, the ones we are giving are just some examples. When testing the scanner, we suggest to instantiate and call the scanner directly from main, so that there is no parsing involved. Then, check that the tokens returned are correct by printing them out (call token_to_string on them and printf them).
Modify the grammar to handle precedence correctly:
The grammar presented above is ambiguous and requires modification. While you need to modify the grammar by hand, we have included a second set of files to help you test your modified grammars. calc_def.l is a lexer and calc_def.y is a parser written for flex and bison respectively (you cannot use them in your final code, but you can use them to help you write and understand your grammar). If you look at calc_def.y, you will see the ambiguous grammar. If you call make, it will compile it to calc_def, which you can then run on input files! You will see which expressions parse and which cause syntax errors. It works even though it is ambiguous (the "shift/reduce warnings" are warning you that the grammar is ambiguous). You can modify that grammar and then test that it still recognizes the exact same set of programs (and find a syntax error on the same set too). If your new grammar is unambiguous, you should see no shift/reduce warnings (or any other types of warnings). However, just because your grammar does not have shift reduce errors, does not mean it correctly handles precedence.
We use the standard precedence for operators (same as for C): Multiplication has a higher precedence than addition and subtraction, addition and subtraction have the same precedence, and all three operators are left associative.
The bar characters are used like parenthesis, but they compute the absolute (positive) value of the expression between the "opening" and the corresponding "closing" bar. That is, |2| = 2, and |-2| = 2.
Modify the grammar to be LL(1):
Again, you should test your grammar with bison (the .y file) to make sure you did not break anything in the process (you want to start Step 4 with the correct grammar instead of finding problems there).
Get the parser written:
Now that you have a grammar ready to go, start writing the parser (by adding new methods to parser_t). This step should actually be very easy if you did the previous three steps correctly. Do not skip this part! The whole point of this assignment is to get you familiar with scanning AND parsing. Solving the calculator problem is just an exercise.
Make sure that you check for errors:
You will need to return an error as soon as possible. That means that if epsilon can be derived from the non-terminal you are working on, you need to check the following token to make sure that it is allowed to appear after the non-terminal that you are currently examining. To ensure that our automated grading system correctly handles your submission, please follow the following required guidelines for your program's output:
- When you detect a scanner or a parser error, print to stderr the line where the error is detected. Your line counter must start at 1. Also, the program must exit with an exit code that is 1 for a scanner error and 2 for a parser error.
- If there is no error and your parser can correctly process the input, you must exit with an error code of 0. The generated dot file should be printed to stdout.
- If you have implemented the full calculator (that is, you evaluate expressions -- see below), the results for all calculations (one for each "Expr") should be printed to stderr, one result per line.
- Nothing else must be printed to stderr or stdout.
Getting to this point gives you full credit on this assignment.
Make it work:
If you want that +5% extra credit, and if you have Steps 1 through 5 done and rock solid, Step 6 is to finish the calculator (so that it really does calculations). The calculator should simply print out the signed integer that the expression evaluates to (you can assume that the integer does not leave the value range of what can be represented in a 32-bit integer). Note that handling associativity is the trickiest part here, since your LL(1) grammar likely does not produce the correct associativity. Thus, the parser must do some intelligent things to compensate for that. If you don't do this part at all, we won't be insulted in the least, it is more related to the later material and I won't cover it for this project. However, we are reserving this extra credit for those students that want to figure it out for themselves and build something that is actually functional.

Deliverables

We are using Gradescope (and its auto-grader feature) to grade this assignment and your submissions.

As a first step, make sure that you received the invitation email and can properly log in.
Once you are done with your scanner/parser, go to the first assignment and submit your code. For this, just submit your "calc.cpp" file.
We do not show you the test cases and the expected output, but you should get some feedback about the types of tests that your submission passes and where it fails.
You can make a new submission once every hour. Make sure you thoroughly test your program locally, and don't (ab)use the auto-grader as a test harness.
Once you have the scanner and parser work, consider adding the calculator functionality and go for the extra credit!

Created by Christopher Kruegel (© 2008, using Apache Cocoon).