Project Goals
     
    
      The goals of this project are:
    
    
      
- to learn how to write a simple parser
- to develop a simple calculator
Administrative Information
    
    
      This is an individual project. 
    
    
      The project is due on Tuesday, October 24, 2023, 23:59:59 PST. There will be no deadline extensions.
    
  
  
Project Introduction
    
    
      In this assignment, we are developing a simple calculator.  The
      goal of this project is to demystify all of the front-end stuff
      that a compiler does. Everything you need is right
      in calc.cpp - one file makes all. The other stuff is
      just to help you develop and test your calculator. We have not
      used any complex tools to do things automatically, everything is
      in very vanilla C++ and the code should be fully commented and
      pretty readable. You _need_ to know what everything
      in calc.cpp does before you can start writing
      code. There should be a "WRITEME" everyplace in the code that is
      in need of your assistance. However, the version that we provide
      to start will compile and even run (although it does not
      actually scan a file, and the grammar it parses is trivial). The
      grammar we want our our simple calculator to recognize is as
      follows:
    
    
      List    -> List Expr T_period
              | Expr T_period 
      Expr    -> Expr T_times Expr
              | Expr T_plus Expr
              | Expr T_minus Expr
	      | T_minus Expr
              | T_num
              | T_openparen Expr T_closeparen
              | T_bar Expr T_bar
    
    
      While the grammar above is ambiguous (in the technical sense) in
      many many ways, it will recognize any program written in our
      calc language (in other words, it should be able to tell if it
      is syntactically correct or not). You will need to change the
      grammar around so that it recognizes the same language but is no
      longer ambiguous. Furthermore, it should be LL(1) so we can
      parse it with recursive decent. Of course, your parser must also
      correctly handle associativity and precedence. Especially for
      associativity, this can be a little tricky.
    
  
  
Tour of the Code
    
    
      The tarball with a skeleton of calc.cpp (and some
      additional files) can be
      downloaded here. You
      will see that the code in calc.cpp is divided into four
      major parts and three classes. The first part contains some
      enums and helper functions to aid in dealing with tokens,
      non-terminals, and printing out all that stuff. Once you figure
      out the grammar that you are going to use, it should be pretty
      straightforward to add the other non-terminals into the
      appropriate enum and helper functions.
    
    
      The second chunk of code is the scanner (which is the C++ class
      scanner_t). The scanner should handle reading
      the input from stdin and identifying the appropriate tokens. The
      interface listed should be supported because that is how the
      recursive decent parser will actually be getting tokens. It does
      not have to be a big state machine or a regular expression
      ... just something that is coded by you and that works. Make
      sure you test your scanner well before you move on. The last
      thing you want to be doing is trying to track down a weird
      scanner problem in your parser. In addition to finding the
      tokens, you should also keep track of newlines so that you can
      find any syntax errors your parser says it finds (it does this
      by calling get_line(), which should return the
      number of the line of input that is now being scanned). The code
      that is in there to start is just stub code so that everything
      compiles and you actually get some visible output (the scanner
      is just returning either a "+" or "eof" randomly). For the basic
      part of the assignment, you do not need to handle any attributes
      (such as the actual value of a number token).
    
    
      The third chunk of code is the class that draws a parse tree
      (called parsetree_t). The class parsetree_t
      need not and should not be modified. All it does is print out
      the parse tree as you discover it. It prints the tree out in a
      format readable by the program "dot" (from the graphviz
      package), which can turn it to a PDF file (the makefile shows
      how to do that). The output is really a bunch of nodes and
      edges. When you start processing a non-terminal, you push it on
      a stack (this will draw an edge from that newly pushed node to
      the current node on the top of the stack, which is its
      parent). When you finish, just pop it. The parse tree that you
      generate should be super helpful for debugging purposes.
    
    
      The fourth chunk is the parser itself
      (class parser_t). There are already some helper
      methods provided for you, but you need to figure out how to
      structure your grammar such that it can be written as a
      recursive decent parser (more on that later). The code that is
      there now will parse the grammar "List -> '+' List | EOF". If
      you run
      parse(), it will call List(),
      which recursively calls itself. As List() is
      executed, it calls the scanner to get new tokens, and it
      calls parsetree_t to actually print out the
      parse tree.
    
  
  
Steps to Solve the Challenge
    
    
      
- Get the scanner working:
 Implement the scanner class and pass in some of the test inputs
	(call make test_parse). You will need more test
	inputs, the ones we are giving are just some examples. When
	testing the scanner, we suggest to instantiate and call the scanner
	directly from main, so that there is no parsing
	involved. Then, check that the tokens returned are correct by
	printing them out (call token_to_string on them
	and printf them).
 
 
- Modify the grammar to handle precedence correctly:
 The
        grammar presented above is ambiguous and requires
        modification. While you need to modify the grammar by hand, we
        have included a second set of files to help
        you test your modified
        grammars. calc_def.l is a lexer
        and calc_def.y is a parser written for flex
        and bison respectively (you cannot use them in your
        final code, but you can use them to help you write and
        understand your grammar). If you look at
        calc_def.y, you will see the ambiguous
        grammar. If you call make, it will compile it
        to calc_def, which you can then run on input files!
        You will see which expressions parse and which cause syntax
        errors. It works even though it is ambiguous (the
        "shift/reduce warnings" are warning you that the grammar is
        ambiguous). You can modify that grammar and then test that it
        still recognizes the exact same set of programs (and find a
        syntax error on the same set too). If your new grammar is
        unambiguous, you should see no shift/reduce
        warnings (or any other types of warnings). However, just
        because your grammar does not have shift reduce errors, does
        not mean it correctly handles precedence.
 
 
-  
	We use the standard precedence for operators (same as for C):
	Multiplication has a higher precedence than addition and
	subtraction, addition and subtraction have the same
	precedence, and all three operators are left associative.
	
 
 
- 
	The bar characters are used like parenthesis, but they compute
	the absolute (positive) value of the expression between the
	"opening" and the corresponding "closing" bar. That is, |2| =
	2, and |-2| = 2.
	
 
 
- Modify the grammar to be LL(1):
 Again, you should test your grammar with bison (the .y file)
        to make sure you did not break anything in the process (you
        want to start Step 4 with the correct grammar instead of
        finding problems there).
 
 
- Get the parser written:
 Now that you have a grammar ready to go, start writing the
        parser (by adding new methods to parser_t). This step
        should actually be very easy if you did the previous three
        steps correctly. Do not skip this part! The whole point of
        this assignment is to get you familiar with scanning AND
        parsing. Solving the calculator problem is just an
        exercise.
 
 
- Make sure that you check for errors:
 You will need to
      return an error as soon as possible. That means that if epsilon
      can be derived from the non-terminal you are working on, you
      need to check the following token to make sure that it is
      allowed to appear after the non-terminal that you are currently
      examining. To ensure that our automated grading system correctly
      handles your submission, please follow the following
      required guidelines for your program's output:
	
- 
	  When you detect a scanner or a parser error, print to
	  stderr the line where the error is detected. Your
	  line counter must start at 1. Also, the program must exit
	  with an exit code that is 1 for a scanner error and 2 for a
	  parser error.
	
- 
	  If there is no error and your parser can correctly process
	  the input, you must exit with an error code of 0. The
	  generated dot file should be printed to stdout.
	
- 
	  If you have implemented the full calculator (that is, you evaluate
	  expressions -- see below), the results for all calculations
	  (one for each "Expr") should be printed to stderr,
	  one result per line.
	
- 
	  Nothing else must be printed to stderr or
	  stdout.
	
 Getting to this point gives you full credit on
      this assignment.
 
 
- Make it work:
 If you want that +5% extra credit, and if you have Steps 1 through
        5 done and rock solid, Step 6 is to finish the calculator (so
        that it really does calculations).  The calculator should
        simply print out the signed integer that the expression
        evaluates to (you can assume that the integer does not leave
        the value range of what can be represented in a 32-bit
        integer). Note that handling associativity is the
        trickiest part here, since your LL(1) grammar likely does not
        produce the correct associativity. Thus, the parser must do
        some intelligent things to compensate for that.  If you don't
        do this part at all, we won't be insulted in the least, it is
        more related to the later material and I won't cover it for
        this project. However, we are reserving this extra credit for
        those students that want to figure it out for themselves and
        build something that is actually functional.
 
 
Deliverables
    
    
      We are using Gradescope (and its
      auto-grader feature) to grade this assignment and your submissions.
    
    
      
- As a first step, make sure that you received the invitation email and can properly log in.
- Once you are done with your scanner/parser, go to the first assignment and submit your code. For this, just submit your "calc.cpp" file.
- We do not show you the test cases and the expected output, but you should get some feedback about the types of tests that your submission passes and where it fails.
- You can make a new submission once every hour. Make sure you thoroughly test your program locally, and don't (ab)use the auto-grader as a test harness.
- Once you have the scanner and parser work, consider adding the calculator functionality and go for the extra credit!