计算与应用数学拔尖博士生系列论坛——Model-Based Reinforcement Learning with Value-Targeted Regression

Date:2019-12-27

Speaker:Zeyu Jia (Peking University)

Time:2019-12-27 12:00-13:00

Venue:Room 1560, Sciences Building No. 1

Abstract: Reinforcement learning (RL) applies to control problems with large state and action spaces, hence it is natural to consider RL with a parametric model. We focus on finite-horizon episodic RL where the transition model admits the linear parametrization: P = \Sum (θ)_i*P_i. This parametrization provides a universal function approximation and capture several useful models and applications. We propose an upper confidence model-based RL algorithm with value-targeted model parameter estimation. The algorithm updates the estimate of θ by recursively solving a regression problem using the latest value estimate as the target. We demonstrate the efficiency of our algorithm by proving its expected regret bound O(d\sqrt{H^3T}), where H, T, d are the horizon, total number of steps and dimension of θ. This regret bound is independent of the total number of states or actions, and is close to a lower bound Ω(\sqrt{HdT}).


欢迎各位同学积极参加,报名链接https://www.wenjuan.com/s/eInQfmE/, 报名截止时间为12月27日上午9:00,我们将为报名同学提供午餐。