1:00 - 1:22 | Ari Polakof and Chang Yuan Li | Offline Reinforcement Learning in Theory and in Practice |
1:22 - 1:44 | Fuheng Zhao | Optimize Join Queries with Deep Reinforcement Learning |
1:44 - 2:06 | Yichen Feng, Mengye Liu, Ming Min | Provably Efficient Q-Learning with Low Switching Cost |
2:06 - 2:28 | Avinash Nargund | Reinforcement Learning for Radio Network Design |
2:28 - 2:50 | Rohan Bhatia | DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections |
2:52 - 3:14 | Yi-lin Tuan, Wanrong Zhu | Reinforcement Learning for Text Generation |
3:14 - 3:36 | Kaiqi Zhang | Minimax OPE for multi-armed bandits |
3:36 - 3:58 | Dheeraj Baby, Ming Yin, Xuandong Zhao | A unifying view of optimism in Episodic Reinforcement Learning |
3:58 - 4:20 | Yuqing Zhu, Jianyu Xu | Is reinforcement learning more difficult than bandits? |