CMPSC 160 Course Description
Department and Course Number: CMPSC 160
Course Title: Translation of Programming Languages
Total Credits: 4
Course Coordinator: Tevfik Bultan
Current Catalog Description
Study of the structure of compilers. Topics include: lexical
analysis; syntax analysis including LL and LR parsers; type checking; run-time
environments; intermediate code generation; and compiler-construction tools.
Prerequisites
CMPSC 64 or ECE 154 and CMPSC 130A and CMPSC 138
Course Goals
(1) To learn structure of compilers.
(2) To learn basic techniques used in compiler construction
such as lexical analysis,
top-down and bottom-up parsing,
context-sensitive analysis, and intermediate code generation.
(3) To learn basic data structures used in compiler construction
such as abstract syntax trees, symbol tables, three-address code,
and stack machines.
(4) To learn software tools used in compiler construction such
as lexical analyzer generators (lex, flex), and
parser generators (yacc, bison).
(5) To construct a compiler for a small language using
the above techniques and tools.
Prerequisites by Topic
Automata theory and formal languages
Programming in C++
Data structures, algorithms, and complexity
Topics Covered in the Course
- Introduction: [1 lecture] Overview of compilers, phases of a compiler.
- Lexical analysis, scanning: [3 lectures, 2 discussions]
- Role of a scanner
- Tokens, lexemes,
specifications of tokens, regular expressions,
regular definitions,
regular expression extensions.
-
Recognizing tokens,
DFAs, NFAs, DFA simulation,
NFA simulation, recognizing
the longest matching prefix,
regular expression-to-NFA conversion,
NFA-to-DFA conversion,
DFA minimization,
time-space tradeoffs in using DFAs and NFAs for scanning.
-
Lexical analyzer generators,
lex, flex.
- Syntax analysis, parsing [6 lectures, 3 discussions]
-
Role of a parser.
-
Context free grammars, derivations, sentential forms,
left-most, right-most derivations,
parse trees. Ambiguity,
ambiguous grammars, precedence, associativity, dangling else problem, eliminating
ambiguity.
Regular languages vs. context free languages,
non context-free languages.
-
Top-down vs. bottom-up parsing. Top-down parsing, recursive descent parsing,
predictive parsing,
left-recursion elimination, left factoring.
- Stack-based predictive parsing, parse tables,
table-driven predictive parsing algorithm (LL parsing algorithm),
FIRST sets, FOLLOW sets.
LL(1), LL(k) grammars, constructing LL(1), LL(k) parse tables,
conflicts in LL(0) parse tables,
parsing ambiguous grammars with LL parsers,
building a parse tree while parsing.
- Bottom-up parsing. Handles, shift-reduce parsers, stack-based shift-reduce
parsing, viable prefixes,
table-driven shift-reduce parsing algorithm (LR parsing algorithm), LR(0) items,
closure, goto operations for LR(0) items.
Sets-of-items construction for LR(0) items,
the NFA and the DFA that recognize viable prefixes,
valid LR(0) items for a viable prefix,
constructing LR(0) parse tables, constructing SLR(1) parse tables,
conflicts in LR parse tables,
LR(k) items, LR(1) items, valid LR(1) items for a viable prefix,
closure, goto operations for LR(1) items.
Sets-of-items construction for LR(1) items,
constructing LR(1) parse tables,
LALR parse table construction,
parsing conflicts in LR parsers,
parsing ambiguous grammars with LR parsers.
- Error recovery strategies in parsing,
error recovery in LL and LR parsing.
-
Parser generators,
yacc, bison.
- Context-Sensitive Analysis [3 lectures, 2 discussions]
-
Syntax-directed definitions, attribute grammars,
synthesized and inherited attributes, dependency graphs, evaluation order,
topological sort, constructing syntax trees, constructing syntax trees for expressions using
syntax-directed definitions.
- S-attributed definitions,
bottom-up evaluation of S-attributed definitions,
L-attributed definitions, depth-first evaluation order, translation schemes,
top-down translation, eliminating left-recursion from a translation scheme, designing
predictive translators, bottom-up evaluation of
inherited attributes.
- Type checking, type systems, type expressions,
static vs. dynamic type checking,
type expressions, basic types, type constructors, type graphs,
equivalence of type expressions, structural vs. name equivalence,
type conversions.
- Runtime environments [2 lectures, 1 discussion]
- Symbol tables. Procedure abstraction,
activation trees, control stack, scope of a declaration,
runtime storage organization, activation records, static data, control stack,
heap,
storage-allocation strategies, procedure calls, call-sequence, return-sequence,
access to non-local names, lexical (static) scope with (or without) nested
procedures, access links, displays, procedure parameters, dynamic scope,
parameter passing, call-by-value, call-by-reference, copy-restore, call-by-name.
- Intermediate code generation [4 lectures, 2 discussions]
- Intermediate representations,
intermediate code generation for assignment statements,
intermediate code generation for boolean expressions,
numerical and flow-of-control representations,
short-circuiting,
intermediate code generation for case statements, backpatching.
- Allocating storage for variables and procedures, generating code for
addressing array elements.
- Generating code for x86.
- Review [1 lecture]
Laboratory projects
A five part programming project in C++. The goal is to incrementally
build a compiler which translates programs written in a simple
programming language x86 assembly.
The input language does not have any object-oriented features
and only allows integer and boolean variable types.
The project involves using lex/flex lexical analyzer generators
and yacc/bison parser generators.
Parts of the project are:
- [1 week] Warmup: Building a recursive-descent parser for
a simple expression language.
- [2 weeks] Scanning and Parsing: Building a scanner and parser
for a simple programming language.
Students will write a lex/flex specification
to automatically generate a scanner
and a yacc/bison specification to automatically generate
a bottom-up parser.
- [2 weeks] Intermediate Representation Generation:
Students will write semantic actions to create an abstract syntax tree for
the given program.
- [2 weeks] Context-sensitive analysis: Students will write methods which
traverse the abstract syntax tree to determine if the input program is legal by
checking conditions such as identifiers are declared before they are used
and that there are no type errors.
- [2 weeks] Code generation: Students will write methods which will
traverse the abstract syntax tree and emit x86 assembly code.
Estimate CSAB Category Content
CSAB Category
|
CORE
|
ADVANCED
|
CSAB Category
|
CORE
|
ADVANCED
|
Data Structures |
_ |
0.5 |
Computer Organization and Architecture |
_ |
_ |
Algorithms |
_ |
0.5 |
Concepts of Programming Languages |
_ |
2 |
Software Design |
_ |
1 |
Oral and Written Communications
Social and Ethical Issues
Students learn the impact of design decisions in programming
languages to future generation of engineers.
Theoretical Content
Students review the following theoretical concepts in this course:
regular expressions, DFAs, NFAs, context free grammars,
regular, context-free and non context-sensitive languages.
Students learn theoretical concepts such as
LL, LR parsers, and attribute grammars.
Problem Analysis
Students learn how theoretical concepts such as finite automata,
regular expressions and context-free grammars can be used in solving
practical problems.
Solution Design
Students learn how using a modular design one can build
complex software systems like compilers.
Students learn how to build complex data structures such as
abstract syntax trees using object-oriented design concepts.