CS 292C - String Analysis - Winter 2018
Instructor:
Tevfik Bultan
Office: Eng. I 2123
Office Hours: Tuesday/Thursday 13:30-14:30
Class Times:
Tuesday/Thursday 11:00-12:50 Location: PHELP 2510
Enrollment Code: 62075
Units: 4
Course Topics
String manipulation is a crucial part of modern software systems; for
example, it is used extensively in input validation and sanitization,
and in dynamic code and query generation. The goal of string-analysis
techniques is to determine the set of values that string expressions can
take during program execution. String analysis can be used to solve many
problems in modern software systems that relate to string manipulation,
such as: (1) Identifying security vulnerabilities by checking if a security
sensitive function can receive an input string that contains an exploit;
(2) Identifying behaviors of JavaScript code that use the eval function
by computing the string values that can reach the eval function; (3)
Identifying html generation errors by computing the html code generated
by web applications; (4) Identifying the set of queries that are sent to
back-end database by analyzing the code that generates the SQL queries;
(5) Patching input validation and sanitization functions by automatically
synthesizing repairs.
Like many other program-analysis problems, it is not possible to solve the
string analysis problem precisely (i.e., it is not possible to precisely
determine the set of string values that can reach a program point). However,
one can compute over- or under-approximations of possible string values. If
the approximations are precise enough, they can enable us to demonstrate
existence or absence of bugs in string manipulating code. String analysis
has been an active research area in the last decade, resulting in a
wide variety of string-analysis techniques. Some of the topics we plan to
discuss in this course include grammar-based string analysis, automata-based
symbolic string analysis, string constraint solving, string abstractions,
relational string analysis, vulnerability detection using string analysis,
differential string analysis, and automated repair using string analysis.
Course Work
There will be several homework assignments (50% of the class grade) and the
students will be required to do a research project (50% of the course grade)
related to string analysis. The papers related to the topics discussed
in the class and/or lecture notes will be given as reading assignments.
Slides from Lectures
Reading Assignments
- Weeks 1 and 2
- Automata-based String Analysis
- Week 3
- Relational String Analysis
- Chapter 5 from the lecture notes.
-
Fang Yu, Tevfik Bultan, and Oscar Ibarra.
"Relational String Verification Using Multi-Track Automata."
International Journal of Foundations of Computer Science (IJFCS),
special issue on
selected papers from the 15th International Conference on Implementation
and Application of Automata (CIAA 2010),
volume 22, number 8, pages 1909-1924, 2011.
- String Abstractions
- Chapter 6 from the lecture notes.
- Fang Yu,
Tevfik Bultan and
Ben Hardekopf.
"String Abstractions for String Verification."
Proceedings of the 18th International SPIN Workshop on
Model Checking of Software (SPIN 2011),
LNCS 6823,
pages 20-37,
Snowbird, Utah, USA, July 14-15, 2011.
- Week 4
- Grammar-based String Analysis
- Symbolic Transducers for Analyzing Sanitizers
- Week 5
- Size Analysis
- David Wagner, Jeffrey S. Foster, Eric A. Brewer, Alexander Aiken:
A First Step Towards Automated Detection of Buffer Overrun Vulnerabilities.
NDSS 2000
- Vinod Ganapathy, Somesh Jha, David Chandler, David Melski, David Vitek:
Buffer overrun detection using linear programming and static analysis.
ACM Conference on Computer and Communications Security 2003: 345-354
- Nurit Dor, Michael Rodeh, Shmuel Sagiv:
CSSV: towards a realistic tool for statically detecting all buffer overflows in C.
PLDI 2003: 155-167
-
Fang Yu, Tevfik Bultan and Oscar H. Ibarra.
"Symbolic String Verification:
Combining String Analysis and Size Analysis."
Proceedings of the
15th International Conference on
Tools and Algorithms for the Construction and
Analysis of Systems (TACAS 2009),
pp. 322-336, York, UK, March 22-29, 2009.
- Week 6
- Dynamic Symbolic Execution
- Bounded String Constraint Solving
-
Adam Kiezun, Vijay Ganesh, Shay Artzi, Philip J. Guo, Pieter Hooimeijer, Michael D. Ernst:
HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars.
ACM Trans. Softw. Eng. Methodol. 21(4): 25 (2012)
-
Adam Kiezun, Vijay Ganesh, Philip J. Guo, Pieter Hooimeijer, Michael D. Ernst:
HAMPI: a solver for string constraints.
ISSTA 2009: 105-116
-
Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, Dawn Song:
A Symbolic Execution Framework for JavaScript.
IEEE Symposium on Security and Privacy 2010: 513-528
- Week 7
- Week 8
- Automated Repair
- Fang Yu,
Muath Alkhalaf and
Tevfik Bultan.
"Patching Vulnerabilities with Sanitization Synthesis."
Proceedings of the 33rd International Conference on
Software Engineering (ICSE 2011),
pages 251-260,
Waikiki, Honolulu , Hawaii, USA, May 21-28, 2011.
- Muath Alkhalaf, Abdulbaki Aydin,and Tevfik Bultan.
"Semantic Differential Repair for Input Validation and Sanitization."
Proceedings of the
2014 International Symposium on Software Testing and Analysis (ISSTA 2014),
pages 225-236,
San Jose, California, USA, July 21-25, 2014.
- Probabilistic Symbolic Execution, Quantitative Information Flow
- Jaco Geldenhuys, Matthew B. Dwyer, Willem Visser:
Probabilistic symbolic execution.
ISSTA 2012: 166-176
-
Quoc-Sang Phan, Pasquale Malacaria, Corina S. Pasareanu, Marcelo d'Amorim:
Quantifying information leaks using reliability analysis.
SPIN 2014: 105-108
-
Lucas Bang, Abdulbaki Aydin, Quoc-Sang Phan, Corina S. Pasareanu, and Tevfik
Bultan.
String Analysis for Side Channels with Segmented Oracles.
Proceedings of the 24th ACM SIGSOFT International Symposium
on the Foundations of Software Engineering (FSE 2016).
-
Quoc-Sang Phan, Lucas Bang, Corina S. Pasareanu, Pasquale Malacaria, and
Tevfik Bultan.
"Synthesis of Adaptive Side-Channel Attacks."
Proceedings of the 2017 IEEE Computer Security Foundations Symposium
(CSF 2017).
- Week 9
Related Tools
- JSA: Java String Analyzer
- BEK: Sanitizer Analyzer
- Z3str3: String Constraint Solver
- Norn: A solver for string constraints
- Hampi: A Solver for String Constraints
- Kaluza: String Solver
- Stranger
and LibStranger:
An Automata Based PHP String Analysis Tool
- SemRep:
Semantic Differential Repair tool for input validation and sanitization code
- ABC: Automata-based model Counter for string constraints
- Symbolic Automata: A library that provides algorithms for composing and analyzing regular expressions, automata, and transducers.
- SPF: Symbolic Path Finder performs symbolic execution of Java bytecodes
Related Publications
- Automata based string analysis
-
String Analysis for Software
Verification and Security
[T. Bultan, F. Yu, M. Alkhalaf, A. Aydin]
-
A static analysis framework for detecting SQL injection vulnerabilities [Fu et al., COMPSAC’07]
-
Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications [Balzarotti et al., S&P 2008]
-
Symbolic String Verification: An Automata-based Approach [Yu et al., SPIN’08]
-
Rex: Symbolic Regular Expression Explorer [Veanes et al., ICST’10]
-
Stranger: An Automata-based String Analysis Tool for PHP [Yu et al., TACAS’10]
-
Relational String Verification Using Multi-Track Automata [Yu et al., CIAA’10, IJFCS’11]
-
Path- and index-sensitive string analysis based on monadic second-order logic [Tateishi et al., ISSTA’11]
-
An Evaluation of Automata Algorithms for String Analysis [Hooimeijer et al., VMCAI’11]
-
Fast and Precise Sanitizer Analysis with BEK [Hooimeijer et al., Usenix’11]
-
Symbolic finite state transducers: algorithms and applications [Veanes et al., POPL’12]
-
Static Analysis of String Encoders and Decoders [D’antoni et al. VMCAI’13]
-
Applications of Symbolic Finite Automata. [Veanes, CIAA’13]
-
Automata-Based Symbolic String Analysis for Vulnerability Detection [Yu et al., FMSD’14]
- String analysis based on context free grammars
- Precise Analysis of String Expressions [Christensen et al., SAS’03]
-
Java String Analyzer (JSA) [Moller et al.]
-
Static approximation of dynamically generated Web pages [Minamide, WWW’05]
-
PHP String Analyzer [Minamide]
-
Grammar-based analysis string expressions [Thiemann, TLDI’05]
- String analysis based on symbolic execution/symbolic analysis
-
Abstracting symbolic execution with string analysis [Shannon et al., MUTATION’07]
-
Path Feasibility Analysis for String-Manipulating Programs [Bjorner et al., TACAS’09]
-
A Symbolic Execution Framework for JavaScript [Saxena et al., S&P 2010]
-
Symbolic execution of programs with strings [Redelinghuys et al., ITC’21]
- String analysis and abstraction/widening:
-
A Practical String Analyzer by the Widening Approach [Choi et al. APLAS’06]
-
String Abstractions for String Verification [Yu et al., SPIN’11]
-
A Suite of Abstract Domains for Static Analysis of String Values [Constantini et al., SP&E’13]
-
String size analysis:
-
A First Step Towards Automated Detection of Buffer Overrun Vulnerabilities.
[Wagner et al., NDSS 2000]
-
Buffer overrun detection using linear programming and static analysis.
[Ganapathy et al. ACM CCS 2003]
-
CSSV: towards a realistic tool for statically detecting all buffer overflows in C.
[Dor et al. PLDI 2003]
-
Symbolic String Verification: Combining String Analysis and Size Analysis [Yu et al., TACAS’09]
-
String constraint solving:
-
Reasoning about Strings in Databases [Grahne at al., JCSS’99]
-
Constraint Reasoning over Strings [Golden et al., CP’03]
-
A decision procedure for subset constraints over regular languages [Hooimeijer et al., PLDI’09]
-
Strsolve: solving string constraints lazily [Hooimeijer et al., ASE’10, ASE’12]
-
An SMT-LIB Format for Sequences and Regular Expressions [Bjorner et al., SMT’12]
-
Z3-Str: A Z3-Based String Solver for Web Application Analysis [Zheng et al., ESEC/FSE’13]
-
Word Equations with Length Constraints: What's Decidable? [Ganesh et al., HVC’12]
-
(Un)Decidability Results for Word Equations with Length and Regular Expression Constraints [Ganesh et al., ADDCT’13]
-
A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions [Liang et al., CAV’14]
-
String Constraints for Verification [Abdulla et al., CAV’14]
-
S3: A Symbolic String Solver for Vulnerability Detection in Web Applications [Trinh et al., CCS’14]
-
Evaluation of String Constraint Solvers in the Context of Symbolic Execution [Kausler et al., ASE’14]
-
Norn: An SMT Solver for String Constraints [Abdulla et al., CAV'15]
-
Effective Search-Space Pruning for Solvers of String Equations, Regular Expressions
and Length Constraints [Zheng et al., CAV'15]
-
An efficient SMT solver for string constraints [Liang et al. Formal Methods in System Design '16]
-
A Solver for a Theory of Strings and Bit-vectors [Subramanian et al., 2016]
-
Flatten and conquer: a framework for efficient analysis of string
constraints [Abdulla et al., PLDI'17]
-
Z3str2: an efficient solver for strings, regular expressions, and length constraints
[Zheng et al., Formal Methods in System Design '17]
-
Z3str3: A String Solver with Theory-aware Branching [Berzish et al., '17]
-
- Bounded string constraint solvers:
-
HAMPI: a solver for string constraints [Kiezun et al., ISSTA’09]
-
HAMPI: A String Solver for Testing, Analysis and Vulnerability Detection [Ganesh et al., CAV’11]
-
HAMPI: A solver for word equations over strings, regular expressions, and context-free grammars [Kiezun et al., TOSEM’12]
-
PASS: String Solving with Parameterized Array and Interval Automaton [Li & Ghosh, HVC’13]
- Model counting for string constraints:
-
A model counter for constraints over unbounded strings [Luu et al., PLDI’14]
-
Automata-based model counting for string constraints [Aydin et al., CAV'15]
-
Model Counting for Recursively-Defined Strings [Trinh et al., CAV'17]
- String analysis for vulnerability detection
-
AMNESIA: analysis and monitoring for NEutralizing SQL-injection attacks [Halfond et al., ASE’05]
-
Preventing SQL injection attacks using AMNESIA. [Halfond et al., ICSE’06]
-
Sound and precise analysis of web applications for injection vulnerabilities [Wassermann et al., PLDI’07]
-
Static detection of cross-site scripting vulnerabilities [Su et al., ICSE’08]
-
Generating Vulnerability Signatures for String Manipulating Programs Using Automata-based Forward and Backward Symbolic Analyses [Yu et al., ASE’09]
-
Verifying Client-Side Input Validation Functions Using String Analysis [Alkhalaf et al., ICSE’12]
-
String analysis for test generation
-
Dynamic test input generation for database applications [Emmi et al., ISSTA’07]
-
Dynamic test input generation for web applications. [Wassermann et al., ISSTA’08]
-
JST: an automatic test generation tool for industrial Java applications with strings [Ghosh et al., ICSE’13]
-
Automated Test Generation from Vulnerability Signatures [Aydin et al., ICST’14]
- String analysis for analyzing dynamically generated code:
-
Improving Test Case Generation for Web Applications Using Automated Interface Discovery [Halfond et al. FSE’07]
-
Automated Identification of Parameter Mismatches in Web Applications [Halfond et al. FSE’08]
- Building Call Graphs for Embedded Client-Side Code in Dynamic Web Applications [Nguyen et al. FSE'15]
- Varis: IDE Support for Embedded Client Code in PHP Web Applications [Nguyen et al. ICSE'15]
-
String analysis for specification analysis:
-
Lightweight String Reasoning for OCL [Buttner et al., ECMFA’12]
-
Lightweight String Reasoning in Model Finding [Buttner et al., SSM’13]
- String analysis for program repair:
-
Patching Vulnerabilities with Sanitization Synthesis [Yu et al., ICSE’11]
-
Automated Repair of HTML Generation Errors in PHP Applications Using String Constraint Solving [Samimi et al., 2012]
-
Patcher: An Online Service for Detecting, Viewing and Patching Web Application Vulnerabilities [Yu et al., HICSS’14]
- Differential string analysis:
-
Automatic Blackbox Detection of Parameter Tampering Opportunities in Web Applications [Bisht et al., CCS’10]
-
Waptec: Whitebox Analysis of Web Applications for Parameter Tampering Exploit Construction. [Bisht et al., CCS’11]
-
ViewPoints: Differential String Analysis for Discovering Client and Server-Side Input Validation Inconsistencies [Alkhalaf et al., ISSTA’12]
-
Semantic Differential Repair for Input Validation and Sanitization [Alkhalaf et al. ISSTA’14]
- Side channel analysis:
- String Analysis for Side Channels with Segmented Oracles [Bang et al. FSE'16]