|
CS160: Project 4 - Type Checking and Semantic Analysis (30% of project score)
Project Goals
The goals of this project are:
- to analyze semantic properties of a CSimple program
- to perform type checking
Administrative Information
This is an individual project.
The project is due on Friday, December 1, 2023, 23:59:59 PST (extended).
Project Introduction
For this project, we want you to analyze semantic properties of
programs and to perform type checking. At this point, you have
perfected your scanner, parser, and built your abstract syntax
tree (AST). That is, your compiler should be able to scan in an
input file, check for syntax errors, and build an AST data
structure. By looking a the visitor ast2dot.cpp class,
you should understand how to traverse the AST and print out the
nodes of the tree.
Now, it is time to find some more errors and to perform type
checking! There are two major hurdles to this project:
Understanding what you need to check for and
understanding how to use a symbol table. We provide the code for
the symbol table for you, you just have to understand how to use
it in your analysis.
For this project, you need to write the
class typecheck.cpp. This class is a visitor class,
similar to ast2dot.cpp. That is, typecheck
operates similar to
ast2dot. It is a visitor class that traverses your
abstract syntax tree (AST) and provides some useful
functionality. The difference to ast2dot is that
typecheck will not print out nodes. Instead, the class will
perform a number of semantic and type checks, as discussed below.
In typecheck.cpp, you will need to implement
a visit*() function for ALL AST classes as well as the
primitive and symbol table classes. Look closely at
the ast2dot.cpp file for reference. Also, the skeleton
class that we provide will help you to get started quicker.
Tour of the Code
You will extend the compiler that you wrote for the previous
Project 3. More precisely, you will develop a new class that
will perform type checking. To get you started with the
typecheck.cpp visitor, you should download the archive
with the project files here. This
archive contains the skeleton for the typecheck class
(in the file typecheck.cpp). Note that the archive also
contains the files from the previous project. Some of these
classes have slightly changed, so please use this latest version
(and copy over your grammar and lexer code). For example, we
realized that we needed an explicit class for the NULL pointer
(you can see that we added it to ast.cdef). So, please make sure
that your code takes these small changes into account.
For this project, you need to understand the symbol table in
more detail (which is implemented in symtab.hpp
and symtab.cpp). The symbol table is actually composed
of four
classes: SymName, Symbol, SymScope,
and SymTab. Below, we highlight important aspects of
each class. You will need to go through the entire files to
really understand what is going on here.
-
SymName: SymName is a data structure that
stores two things: the symbol name (actual, literal spelling
of the ID) and a pointer to a symbol object for that name. You
already know this class from Project 3, where you used it to
store the symbol names in the AST.
-
Symbol: Symbol is a data structure that
simply defines the type of each object. If you look through
the AST classes, at no point is a symbol created. Why? The
answer is that you did not have to define (and should not have
defined) the types when building the AST. That is, you should
create symbols when you typecheck, not when you parse. This
implies that you will have to create symbols as
your typecheck visitor traverses the AST. Note that
the actual type is stored in the member m_basetype of
a Symbol object.
-
SymScope: SymScope is a data structure to help you in
checking the scope of each object. You do not actually have to
create a SymScope object in your visitor class. SymScope is
encapsulated by the SymTab class. You do have to decide
when to open or close scopes, though!
-
SymTab: SymTab is the actual symbol
table. Check the interface that this class exposes to see how
it can be used, in particular, the insert*
and lookup functions.
In addition to the symbol table, there is the Attribute
class. It is a struct that stores management information for AST
nodes. In particular, it stores the line where the corresponding
grammar symbol appears in the source file, the scope of the
current symbol, and the type of the subtree. You have to
manipulate the scope (m_scope) and the type
(m_basetype) when checking the program (i.e., when
walking the AST with your typecheck visitor).
Steps to Solve the Challenge
-
The idea is that your typecheck visitor
calls accept on an AST node. The AST node (object)
calls back one of the visit*(this) functions that the
typechecker implements. The typechecker function then does its
work. At one point, will have to call accept on the
node's children. You start with the root node of the AST and
then traverse it, performing the necessary checks that are
listed below.
-
You need to perform the following checks:
-
One main function:
Only one procedure
Main() can exist, and must exist at file
scope (global), and this is case sensitive. If there are
multiple main functions, exit with error code 2.
-
Main() has no arguments:
Main() cannot have arguments. If it does, exit with error code 3.
-
Duplicate Procedures:
A procedure ID can be used only once in the same scope. If this property is violated, exit with error code 4.
-
Duplicate Variables:
A variable ID can be used only once in the same scope. If this property is violated, exit with error code 5.
-
Undefined Procedures:
All procedures must be defined in the current or higher scope before they are used (before they can be called). If this property is violated, exit with error code 6.
-
Undefined Variables:
All variables must be defined in the current or higher scope before they are used. If this property is violated, exit with error code 7.
-
Number of argument mismatch:
When a procedure is called, the number of arguments passed in must match the number when the procedure was declared. If this property is violated, exit with error code 8.
-
Argument type error:
When a procedure is called, the types of the arguments passed in must match the types of the arguments in the procedure declaration. The arguments cannot be strings. If this property is violated, exit with error code 9.
-
Return type error:
Return statements must return a value of the same type as declared by the procedure. The return type cannot be of type string. If this property is violated, exit with error code 10.
-
Procedure call assignment type error:
When a
procedure is used the return type of a procedure must
match the variable to which it is being assigned. If this property is violated, exit with error code 11.
-
If statement premise type error:
The premise
of an if statement must be of type Boolean. If this property is violated, exit with error code 12.
-
While loop requirement type error:
The requirement of a while statement must be of type Boolean. If this property is violated, exit with error code 13.
-
String index type error:
Strings (character arrays) can only be indexed by an
expression that evaluates to an integer. If this property is violated, exit with error code 14.
-
No array variable error:
Only string variables can be indexed. If this property is violated, exit with error code 15.
-
Incompatible assignment error:
The types of
the left-hand side and the right-hand side of an
assignment must match. Note that one can only assign
characters to individual string (character array)
elements. The NULL pointer can be used as either
a char pointer or an integer pointer. If this property is
violated, exit with error code 16.
-
Expression type error:
The types of
expressions must match. The rules for expressions are the
following: For arithmetic operations (+,-,*,/), both
operands must be integer, and the resulting type is
integer (see exceptions for pointers below). For logic
operations (&&,||), both operands must be Boolean,
and the resulting type is Boolean. For the following
comparison operations (<,<=,>,>=), the
operands must be integer, and the result is Boolean. For
(in)equality operators (==, !=), the operands can be both
integer, both Boolean, both characters, both char
pointers, or both integer pointers (the NULL pointer can
be used whenever a char or an int pointer is valid). The
absolute values operator (| |) can be applied only to integer
expressions or string variables, and the result is of type
integer. The not operation (!) can only be applied to
Boolean expressions, and the result is Boolean. If this
property is violated, exit with error code 17.
-
Pointer arithmetic:
It is possible to
add/subtract an integer to/from a pointer. No other
arithmetic operations are possible on pointers. If this
property is violated, exit with error code 18.
-
Usage of AddressOf:
The AddressOf operator
(&) can only be applied to integers, chars, and
indexed strings (string[i]). If this property is violated,
exit with error code 19.
-
Usage of Deref:
The deref operator (^) can
only be applied to integer pointers and char pointers. If
this property is violated, exit with error code
20.
-
In your typechecker class, you will use the symbol table
(SymTab* st) to store symbols (variable names,
function names, ...) together with their types. That is,
whenever a variable is declared, you can store the name and
its type in the symbol table. The same can be done for
function names, function arguments, and function return
values. Whenever a symbol is about to be entered into the
symbol table, you probably want to check whether it is already
in there (to detect duplicate variables and functions). When a
procedure is invoked, you can check whether the invocation
conforms to the declaration (correct number of arguments and
return value). Similarly, when variables are used in
expressions, their previously declared types can be retrieved
from the symbol table to check whether they are used in the
correct context (e.g., only integer values can be used as
operands for arithmetic expressions).
-
As an example, consider the code fragment below, which shows
one possible way to implement the check for duplicate
variable declarations:
0: // add symbol table information for all the declarations following
1: void add_decl_symbol(DeclImpl *p)
2: {
3: list<SymName_ptr>::iterator iter;
4: char *name; Symbol *s;
5:
6: for (iter = p->m_symname_list->begin(); iter != p->m_symname_list->end(); ++iter) {
7: name = strdup((*iter)->spelling());
8: s = new Symbol();
9: s->m_basetype = p->m_type->m_attribute.m_basetype;
10:
11: if (! m_st->insert(name, s))
12: this->t_error(dup_var_name, p->m_attribute);
13: }
14: }
15:
16: void visitDeclImpl(DeclImpl * p)
17: {
18: ...
19: add_decl_symbol(p);
20: }
The function visitDeclImpl(DeclImpl * p) will be
invoked when you visitor calls accept on a variable
declaration AST node. At one point, this function
calls add_decl_symbol() (Line 19). As you can
see, add_decl_symbol() iterates over the list of
variables that are declared (Line 6). For each variable, it
first extracts its name (Line 7) and creates a Symbol
object (Line 8). Its type is set to the type that this
variable declaration block declares (Line 9). Then, the new
variable, together with its type, is inserted into the symbol
table (Line 11). Note that this operation also checks whether
the variable name is already in the symbol table. If the
symbol is indeed present, the insert call will return
false, and an appropriate error should be raised (Line 12).
Make sure that you understand how variables of
the same name can be legally declared
in different scopes, though.
What Your Compiler Has to Do!
-
Your compiler must successfully parse any valid input
file.
-
Your compiler must generate the correct AST.
-
Your compiler must check the properties listed above. When a
certain program property is violated, an appropriate error
must be thrown (please use the appropriate error code for each
type error to help us with automated grading). Correct
programs must be accepted.
Deliverables
Like for the previous project, we are using Gradescope (and its
auto-grader feature) to grade this assignment and your submissions.
- Once you are done with your scanner/parser, go to the third assignment and submit your code.
- For this project, please submit your "lexer.l", "parser.ypp", and "typecheck.cpp" files. We supply the rest and build your project.
- We do not show you the test cases and the expected output, but you should get some feedback about the types of tests that your submission passes and where it fails.
- You can make a new submission once every hour. Make sure you thoroughly test your program locally, and don't (ab)use the auto-grader as a test harness.
|