CS 160 - Project 5

Project 5

Symbol Table & Type Checking

In this project you will construct a symbol table and perform type checking on input programs using the AST. The symbol table will store all necessary information about symbols (classes, methods, members, and variables) present and will contain information necessary to type check and generate code. You will then build on top of the symbol table generation and perform type checking on the input programs. You can make a pair with another student from the course to complete this project. We recommend you to make a pair but it's not mandatory.

Specification and Description Changelog

A list of all the changes made to this project description is below:

No changes so far.

Files

You can find the provided code files required the project in p5.zip.

All files from Project 4 will be again present in this project. You will need to copy your completed lexer.l and parser.y files into the code folder. You do not have to selectively copy your rules, as the lexer.l and parser.y files have not changed.

There are several new files in the code folder. The typecheck.cpp and typecheck.hpp define the Symbol Table as well as a TypeCheck visitor which will visit the AST and construct the table and perform type checking. You will need to write implementation of the TypeCheck visitor inside typecheck.cpp.

Description

Project 5 has two steps to complete:

Write code for TypeCheck visitor to build symbol table.
Write code for TypeCheck visitor to type check the program.

Step 1: Build the Symbol Table

Before we can perform type checking or generate code for programs, we need to build a symbol table. The symbol table will store all necessary information about symbols in the language. Symbols are classes, members, methods, and variables. The symbol table will have a hierarchical structure with three levels. This structure can be observed in the code at the top of typecheck.hpp, where the symbol table data types are defined.

The highest level is a map from a class name to information about the class. In the code this is called a ClassTable and it maps from an std::string to a ClassInfo. A ClassInfo consists of another string which stores the super class name (or an empty string if there is no super class), a table of the class members (of type VariableTable, which is described below), a table of the class methods (of type MethodTable, which is described below), and the total size of the class members. Each time your visitor processes a class in an input program, it will need to create a relevant ClassInfo and insert it into the main ClassTable. There is only one ClassTable for each program, and this will need to be created at the very beginning of the visitation.

The middle level is a map from a method name to information about the method. In the code this is called a MethodTable and it maps from an std::string to a MethodInfo. A MethodInfo consists of a compound return type (compound types are described below), a table of local variables (also a VariableTable, like class members), and a total size of local variables. Each time your visitor processes a method, it will need to create a relevant MethodInfo. The MethodInfo will be inserted into the method table for the class containing the method. To accomplish this in your code, you will want to save a reference to the current MethodTable that methods should be inserted into, which will come from the current class' ClassInfo. There is a member on the Symbol visitor called currentMethodTable which can help you do this.

The final level of the symbol table is a map from a variable/member name to information about that variable/member. In the code this is called a VariableTable and it maps from an std::string to a VariableInfo. A VariableInfo contains the variable's compound type, the variable's offset in the variable space (either local variable space, parameter space, or member space), and the variable's size. Each time your visitor processes any kind of variable or member declaration, which takes place in both DeclarationNodes and ParameterNodes, it will need to insert one or more variables into the appropriate variable table. Another member on the Symbol visitor is provided to assist with this, called currentVariableTable. Keep in mind that class members and local variables get inserted into different variable tables. You may find it easier to set the currentVariableTable to the member table at some points in the visitation and to the local variable table at other points in the visitation. Specifically, when you create the new ClassInfo or MethodInfo for a new class or method might be a time to consider updating the currentVariableTable.

When each variable is inserted into the variable or member table, it needs to have an offset. This offset will be used when accessing the memory for the variable or member in the assembly code. Class members, function parameters, and local variables all are stored in different places in memory and have different applicable offsets. For members in a given class, the first member will be at offset 0 from the object pointer, the next member at offset 4, the third at offset 8, and so on. For parameters, the first will be at offset 12 from the base/frame pointer (%ebp), the second at 16, the third at 20, and so on. The parameters will start at 12 instead of 8 to leave room for the self pointer, which will be implicitly passed as the first argument. For local variables, the first will be at offset -4 from the base/frame pointer, the second at -8, the third at -12, and so on. You will need to keep track of what offset is appropriate for the next variable, so that you can insert the variable with the correct offset. There are three members of the Symbol visitor which are provided to make this easier: currentLocalOffset for local variables, currentParameterOffset for function parameters, and currentMemberOffset for class members. You can make use of these members to keep a record of what the offset is for the next declared variable of each kind.

When all variables/members are inserted into a given class member table or method variable table, the final size of members or locals needs to be saved. This information is what allows us to allocate the correct amount of space when calling a function or allocating an object of a given class type. This calculation will most likely occur immediately before a ClassInfo or MethodInfo is inserted into the symbol table, as we know that it is finished growing at that point. You may find the above mentioned members of Symbol also helpful in doing this.

All variables must also have a type associated when they are inserted. Methods also will have a type associated, which corresponds to the return type. In the context of the symbol table, these types will be CompoundType. This stores a BaseType (integer, boolean, none, object) but also can store a string corresponding to the class name of an object type. You will need to construct the appropriate type for each variable/member before it is inserted into the appropriate variable table. To determine the type of a variable or method, you will need to get information from the TypeNode children of declaration, parameter, and method nodes in the AST. There is a member on every ASTNode of type BaseType which you can use to pass base type information around the tree. For example, you may want to set the base type of each TypeNode when it is visited, and then access that information from the parent. Also remember that you can access the identifier of an ObjectTypeNode directly (it is a public member).

Remember to set the classTable member of the Symbol visitor at the very beginning of visitation. This is the member that will be accessed by the main code and passed into the code generation visitor for its use.

Another thing to note is that not all visitor functions need to have code in them for the Symbol visitor. You may find that only a small set of AST nodes contain symbol information, and the rest of the nodes can mostly be ignored.

An important note is that all the members of the Symbol visitor will need to be set and maintained by your code. None of the offsets, current tables, or other information is automatically tracked already or automatically incremented or set. All six members are completely up to you to work with, set, and update.

Step 2: Perform Type Checking

Once the symbol table code is complete, you will need to perform type checking. This type checking will be done at the same time as the symbol table is built, using the same TypeCheck visitor. You'll make use of some of the same members and ideas.

The basetype and objectClassName members of AST nodes will need to be used here. You can use these two members to attach information to AST nodes which can then be used in the parent node to perform type checking. For example, you can attach type information to the two child expressions of a PlusNode inside their respective visitor functions, and then you can use that type information in the visitor for PlusNode once the children are visited.

Additionally, you will need to make use of the partially built symbol table. This will provide information about defined classes, methods, members, and variables/parameters. When type checking, make note of the parameters member of the MethodInfo struct. This should be set when inserting methods and can then be used later to check method calls.

You will need to check for many type errors, which are listed below. If a type error is detected, you should "throw" the error by calling the typeError function with the appropriate error code. The error codes are defined as an enumeration and are listed along with the error description in the list below.

Type errors for the language:

Undefined Variable - undefined_variable
This applies when a variable is referred to when it has not been defined as a local variable, parameter, or member.
Undefined Method - undefined_method
This applies when a method is called on an object (or on the current object) that does not exist in the class of that object or any super class.
Undefined Class - undefined_class
This applies when a class is referred to as a super class, a type of a variable, or a constructor name that has not been defined above in the program.
Undefined Member - undefined_member
This applies when a member is referred to on an object (as part of an expression, or the destination of an assignment) that does not contain a member of that name.
Not An Object - not_object
This applies when a method is called on or a member is referred to on a variable that is not an object.
Expression Type Mismatch - expression_type_mismatch
This applies when incompatible types are used in an expression. The allowed types for each expression are listed below.
Argument Number Mismatch - argument_number_mismatch
This applies when a method or constructor is called with an incorrect number of arguments (not including self in any way).
Argument Type Mismatch - argument_type_mismatch
This applies when a method or constructor is called with the correct number of arguments, but one or more of the arguments does not have the type declared for the corresponding parameter.
While Predicate Type Mismatch - while_predicate_type_mismatch
This applies when the guard expression of a while loop is not boolean.
Do While Predicate Type Mismatch - do_while_predicate_type_mismatch
This applies when the guard expression of a do-while loop is not boolean.
If-Else Predicate Type Mismatch - if_predicate_type_mismatch
This applies when the guard expression of an if or if-else is not boolean.
Assignment Type Mismatch - assignment_type_mismatch
This applies when the type of the expression in an assignment does not match the type of the destination variable or member.
Return Type Mismatch - return_type_mismatch
This applies when the expression in the return statement for a method does not match its declared return type, or when a return statement is present in a method with a none return type.
Constructor Returns Type - constructor_returns_type
This applies when a class constructor declares a return type other than none. Constructors are methods that have the exact same name (including capitalization) as the class.
No Main Class - no_main_class
This applies when there is no class called Main in the program.
Main Class Has Members - main_class_members_present
This applies when there is a class called Main in the program but it declares members (the main class is not allowed to have members).
No main Method in the Main Class - no_main_method
This applies when there is a class called Main in the program but it does not include a main method.
main Method has Incorrect Signature - main_method_incorrect_signature
This applies when the main method inside the Main class has a return type that is not none or has any declared parameters.

The operand and result types for expressions:

Plus, Minus, Times, Divide, Negation all expect integer operands (two or one) and produce an integer.
Greater and GreaterEqual expect two integer operands and produce a boolean.
Equal expects two operands of the same type, which must be both integer or both boolean.
And, Or, Not all expect boolean operands (two or one) and produce a boolean.
Question mark(QM) expect boolean for the first operand. The second and the third operand must be of the same type. If both are integers, then QM produces a integer. If both are booleans, then QM produces a boolean.
IntegerLiteral produces an integer.
BooleanLiteral produces a boolean.
Variable and MemberAccess produce the type of the corresponding variable or member.
MethodCall produces the return type of the corresponding method.
New produces an object of the class whose constructor is called. It does not produce none.

Other restrictions on the language:

Classes may only inherit from classes declared above them.
Only methods declared above the current method may be called inside it.
Recursion, due to the above restriction, is not allowed.
A variable expression refers to a local variable or parameter of that name; then if that is not found, a member in the current class; then, finally, a member in one of the super classes of the current class.
Members of an object may be in that object's class or any of its super classes.

Output

The output from the program should be either a complete symbol table, or one type error. Type errors will not have line numbers. The provided tests will NOT be the entirety of what your project is graded on. Each test ending in .good.lang should print a symbol table and no error. Each test ending in .bad.lang should print a single type error. The type errors expected for each test can be seen in the output.txt file.

To aid in solving the project, each bad test has a comment indicating where the first type error (the one that should be thrown) occurs. The comment looks like:

/* TYPE ERROR THIS LINE **/

You can search for this when debugging or verifying your solution.

Requirements

To obtain full credit for this project, your solution will need to:

Satisfy all Project 3 and Project 4 requirements.
Construct a complete AST for all valid input programs.
Construct a complete Symbol Table for all valid input programs.
Output a single type error for poorly-typed programs.
Output the symbol table for well-typed programs.
Execute on any input without segmentation faults or any kind of crashes.

It is highly recommended to verify your solution output for the provided test cases. We have provided the expected output for all tests in a file called output.txt. More information about how to do this can be found in the Grading section.

Make sure that you compile, run, and test your program on the CSIL server. Especially if you write your program on your own machine. Your assembly code may run completely differently on a different architecture, so it is extremely important to make sure that all above requirements are satisfied when building on CSIL.

Deliverables

You should submit to the GradeScope the files lexer.l, parser.y, and typecheck.cpp with complete lexer and parser implementations, complete AST building code, complete symbol table building code, and complete type checking code.

You also must include a README file which includes your and your parner's (if applicable) name, perm number, email address, and any issues with your solution and explanations.

You should not submit the test cases, makefile, or the unmodified source files.

Grading

Your grade will be based on the proportion of test cases for which your program produces correct output. We have provided some of the test cases and their expected output along with the code files.

All good (ending with .good.lang) test cases should parse successfully with no errors and generate and print a complete symbol table. The first few test cases are designed to be smaller and more human-readable, which should aid in debugging your symbol table generation. All bad (ending with .bad.lang) test cases should generate a single type error.

To run your solution on the tests cases, you can type make run. This will run all test cases in the tests folder and show the output. The expected output for each test is provided in the file output.txt. You can compare your output against the expected output by typing make diff.