CS292F (Spring 2021) Statistical Foundation of Reinforcement Learning
|
Syllabus
[
link
]
Instructor:
Prof. Yu-Xiang Wang
Lecture Section: Monday/Wednesday 1:00-2:40 pm Location: on Zoom (link will be sent to you via email.)
Piazza: https://piazza.com/ucsb/spring2021/cs292/home
Piazza is our main channel of communication. Questions should be posted here.
Gradescope: https://www.gradescope.com/courses/258384
This is where you submit your homeworks and project reports.
Office hours:
Instructor: by appointment.
Course evaluation:
40% Homework, 40% Project, 10% for attendance / Participation. 10% for scribing.
Scribing:
Please volunteer here, use this latex template
Textbook:
-
Agarwal, Jiang, Kakade and Sun,
Reinforcement Learning: Theory and Algorithms,
unpublished working draft (Dec 2020) . [Available here]
-
Sutton and Barto. Reinforcement learning: An introduction, MIT press, Second Edition, 2018.
Acknowledgments
The instructor sincerely thanks Wen Sun, Nan Jiang and Sham Kakade for sharing
the homeworks and other materials from CS 6789 at Cornell/University of Washington and CS 598 at UIUC.
Course Schedule / Scribed Notes
| Date | Lectures | Readings | Assignments |
1 | 29-Mar | Introduction and MDP basics [annotated, scribe] | AJKS Ch 1.1-1.2 | HW0 out |
2 | 31-Mar | Markov Decision Processes I [annotated, scribe] | AJKS Ch 1.3-1.5 | |
3 | 5-Apr | Markov Decision Processes II [annotated, scribe] | AJKS Ch 2 | HW1 out |
4 | 7-Apr | MDP III and RL Algorithms I [annotated, scribe] | SB Ch 5-6 | |
5 | 12-Apr | RL Algorithms II [annotated, scribe] | SB Ch 9-10 | HW0 due |
6 | 14-Apr | RL Algorithm III and Exploration I: MAB [annotated] | SB Ch 13, AJKS Ch 9, AJKS Ch 5.1 | |
7 | 19-Apr | Exploration I: MAB and Linear Bandits [annotated, scribe] | AJKS Ch 5.1 | Project proposal due |
8 | 21-Apr | Exploration II: Linear Bandits [annotated, scribe] | AJKS Ch 5.2-5.3 | |
9 | 26-Apr | Exploration III: Tabular MDPs [annotated, scribe] | AJKS Ch 6 | HW2 out / HW1 due |
10 | 28-Apr | Exploration IV: Linear MDP [annotated, scribe] | AJKS Ch 7 | |
11 | 3-May | Wrap up exploration, Intro to Offline RL [annotated] | AJKS 7.3-7.4, Lihong's perspective article. | Midterm report due |
12 | 5-May | Offline RL: OPE in Bandits and RL [annotated, scribe] | (W., Agarwal, Dudik, 2016) (Jiang et al., 2016) | |
13 | 10-May | Offline RL: MIS and Fitted Q Iterations [annotated, scribe] | (Yin and W., 2019) (Duan and Wang, 2019) | HW2 due |
14 | 12-May | Offline RL: Uniform OPE [annotated] | (Yin et al., 2020)
| |
15 | 17-May | Offline RL: Uniform OPE and optimal offline learning [annotated] | (Yin et al., 2020)
| HW3 out |
16 | 19-May | Offline RL: Function approximation [annotated] | AJKS Ch 15 | |
17 | 24-May | Office Hours / Project Consulation | | |
18 | 26-May | Office Hours / Project Consulation | | |
19 | 31-May | No lecture, Memorial Day | | |
20 | 2-Jun | Mini-Symposium on Statistical RL | | HW3 due / Final project report due |