CS292F (Spring 2021) Statistical Foundation of Reinforcement Learning


Syllabus [ link ]

Instructor: Prof. Yu-Xiang Wang


Lecture Section: Monday/Wednesday 1:00-2:40 pm Location: on Zoom (link will be sent to you via email.)

Piazza: https://piazza.com/ucsb/spring2021/cs292/home
Piazza is our main channel of communication. Questions should be posted here.

Gradescope: https://www.gradescope.com/courses/258384
This is where you submit your homeworks and project reports.

Office hours: Instructor: by appointment.

Course evaluation: 40% Homework, 40% Project, 10% for attendance / Participation. 10% for scribing.

Scribing: Please volunteer here, use this latex template

Textbook:

Acknowledgments The instructor sincerely thanks Wen Sun, Nan Jiang and Sham Kakade for sharing
the homeworks and other materials from CS 6789 at Cornell/University of Washington and CS 598 at UIUC.

Course Schedule / Scribed Notes

DateLecturesReadingsAssignments
129-MarIntroduction and MDP basics [annotated, scribe]AJKS Ch 1.1-1.2HW0 out
231-MarMarkov Decision Processes I [annotated, scribe] AJKS Ch 1.3-1.5 
35-AprMarkov Decision Processes II [annotated, scribe] AJKS Ch 2HW1 out
47-AprMDP III and RL Algorithms I [annotated, scribe] SB Ch 5-6 
512-AprRL Algorithms II [annotated, scribe]SB Ch 9-10 HW0 due
614-Apr RL Algorithm III and Exploration I: MAB [annotated] SB Ch 13, AJKS Ch 9, AJKS Ch 5.1 
719-Apr Exploration I: MAB and Linear Bandits [annotated, scribe]AJKS Ch 5.1 Project proposal due
821-Apr Exploration II: Linear Bandits [annotated, scribe] AJKS Ch 5.2-5.3 
926-AprExploration III: Tabular MDPs [annotated, scribe] AJKS Ch 6 HW2 out / HW1 due
1028-AprExploration IV: Linear MDP [annotated, scribe]AJKS Ch 7 
113-MayWrap up exploration, Intro to Offline RL [annotated] AJKS 7.3-7.4, Lihong's perspective article.Midterm report due
125-MayOffline RL: OPE in Bandits and RL [annotated, scribe] (W., Agarwal, Dudik, 2016) (Jiang et al., 2016)  
1310-MayOffline RL: MIS and Fitted Q Iterations [annotated, scribe] (Yin and W., 2019) (Duan and Wang, 2019) HW2 due
1412-MayOffline RL: Uniform OPE [annotated](Yin et al., 2020)  
1517-MayOffline RL: Uniform OPE and optimal offline learning [annotated] (Yin et al., 2020) HW3 out
1619-MayOffline RL: Function approximation [annotated] AJKS Ch 15 
1724-MayOffice Hours / Project Consulation  
1826-MayOffice Hours / Project Consulation  
1931-MayNo lecture, Memorial Day  
202-JunMini-Symposium on Statistical RL HW3 due / Final project report due