CS292F (Spring 2021) Statistical Foundation of Reinforcement Learning

Syllabus [ link ]

Instructor: Prof. Yu-Xiang Wang

Lecture Section: Monday/Wednesday 1:00-2:40 pm Location: on Zoom (link will be sent to you via email.)

Piazza: https://piazza.com/ucsb/spring2021/cs292/home
Piazza is our main channel of communication. Questions should be posted here.

Gradescope: https://www.gradescope.com/courses/258384
This is where you submit your homeworks and project reports.

Office hours: Instructor: by appointment.

Course evaluation: 40% Homework, 40% Project, 10% for attendance / Participation. 10% for scribing.

Scribing: Please volunteer here, use this latex template

Textbook:

Agarwal, Jiang, Kakade and Sun, Reinforcement Learning: Theory and Algorithms,
unpublished working draft (Dec 2020) . [Available here]
Sutton and Barto. Reinforcement learning: An introduction, MIT press, Second Edition, 2018.

Acknowledgments The instructor sincerely thanks Wen Sun, Nan Jiang and Sham Kakade for sharing
the homeworks and other materials from CS 6789 at Cornell/University of Washington and CS 598 at UIUC.

Course Schedule / Scribed Notes

	Date	Lectures	Readings	Assignments
1	29-Mar	Introduction and MDP basics [annotated, scribe]	AJKS Ch 1.1-1.2	HW0 out
2	31-Mar	Markov Decision Processes I [annotated, scribe]	AJKS Ch 1.3-1.5
3	5-Apr	Markov Decision Processes II [annotated, scribe]	AJKS Ch 2	HW1 out
4	7-Apr	MDP III and RL Algorithms I [annotated, scribe]	SB Ch 5-6
5	12-Apr	RL Algorithms II [annotated, scribe]	SB Ch 9-10	HW0 due
6	14-Apr	RL Algorithm III and Exploration I: MAB [annotated]	SB Ch 13, AJKS Ch 9, AJKS Ch 5.1
7	19-Apr	Exploration I: MAB and Linear Bandits [annotated, scribe]	AJKS Ch 5.1	Project proposal due
8	21-Apr	Exploration II: Linear Bandits [annotated, scribe]	AJKS Ch 5.2-5.3
9	26-Apr	Exploration III: Tabular MDPs [annotated, scribe]	AJKS Ch 6	HW2 out / HW1 due
10	28-Apr	Exploration IV: Linear MDP [annotated, scribe]	AJKS Ch 7
11	3-May	Wrap up exploration, Intro to Offline RL [annotated]	AJKS 7.3-7.4, Lihong's perspective article.	Midterm report due
12	5-May	Offline RL: OPE in Bandits and RL [annotated, scribe]	(W., Agarwal, Dudik, 2016) (Jiang et al., 2016)
13	10-May	Offline RL: MIS and Fitted Q Iterations [annotated, scribe]	(Yin and W., 2019) (Duan and Wang, 2019)	HW2 due
14	12-May	Offline RL: Uniform OPE [annotated]	(Yin et al., 2020)
15	17-May	Offline RL: Uniform OPE and optimal offline learning [annotated]	(Yin et al., 2020)	HW3 out
16	19-May	Offline RL: Function approximation [annotated]	AJKS Ch 15
17	24-May	Office Hours / Project Consulation
18	26-May	Office Hours / Project Consulation
19	31-May	No lecture, Memorial Day
20	2-Jun	Mini-Symposium on Statistical RL		HW3 due / Final project report due