RI: Small: Towards Optimal and Adaptive Reinforcement Learning with Offline Data and Limited Adaptivity

Principal Investigator
Yu-Xiang Wang, University of California at Santa Barbara
Project Summary

Funded by NSF RI 2007117.

This material is based upon work supported by the National Science Foundation under Grant No. 2007117. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Project Summary

Reinforcement learning (RL) is one of the fastest-growing research areas in machine learning. RL-based techniques have led to several recent breakthroughs in artificial intelligence, such as beating human champions in the game of Go. The application of RL to real life problems, however, remains limited, even in areas where a large amount of data has already been collected. The crux of the problem is that most existing RL methods require an environment for the agent to interact with, but in real-life applications, it is rarely possible to have access to such an environment — deploying an algorithm that learns by trial-and-errors may have serious legal, ethical and safety issues. This project aims to address this conundrum by developing algorithms that learn from offline data. The outcome of the research could significantly reduce the overhead of using RL techniques in real-life sequential decision-making problems such as those in power transmission, personalized medicine, scientific discoveries, computer networking and public policy.


  1. Invited talk: "Uniform Offline Policy Evaluation and Offline Learning in Tabular RL"
    Berkeley Simons Institute Workshop on RL from Batch Data and Simultation. [Slides]
  2. Invited talk: "Near Optimal Provable Uniform Convergence in OPE for Reinforcement Learning"
    RL Theory Seminar [Slides, Video]

Research Results

  1. Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism
    Ming Yin, Yaqi Duan, Mengdi Wang, Yu-Xiang Wang.
    ICLR 2022 (to appear). [openreview]
  2. Towards Instance-Optimal Offline Reinforcement Learning with Pessimism
    Ming Yin, Yu-Xiang Wang.
    NeurIPS 2021. [arxiv]
  3. Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
    Ming Yin, Yu-Xiang Wang.
    NeurIPS 2021. [arxiv]
  4. Near-Optimal Offline Reinforcement Learning via Double Variance Reduction
    Ming Yin, Yu Bai, Yu-Xiang Wang.
    NeurIPS 2021. [arxiv]
  5. Near Optimal Provable Uniform Convergence in Offlin Policy Evaluation for Reinforcement Learning
    Ming Yin, Yu Bai, Yu-Xiang Wang.
    AISTATS 2021. (*Plenary oral presentation) [arxiv]
  6. Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
    Ming Yin, Yu-Xiang Wang.
    AISTATS 2020. [arxiv]
  7. Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
    Tengyang Xie, Yifei Ma, Yu-Xiang Wang.
    NeurIPS 2019. [arxiv]
  8. Provably Efficient Q-Learning with Low Switching Cost
    Yu Bai, Tengyang Xie, Nan Jiang, Yu-Xiang Wang.
    NeurIPS 2019. [arxiv]


  1. CS292F Statistical Foundation of Reinforcement Learning
    Instructor: Yu-Xiang Wang, 2021 Spring [ Course website ]
  2. Mini-Symposium on Statistical RL
    Nine teams of student presentations. [ Website ]
  3. Undergraduate Research Project: "Empirical Benchmarking of Offline RL methods"
    Ari Polakof, Noah Pang, Qiru Hu, Sara Mandic. Advised by Ming Yin and Yu-Xiang Wang.
    [ Project webpage]