机器学习与数据科学博士生系列论坛(第七十期)—— Near Minimax Optimal Distributional Reinforcement Learning

发文时间:2024-04-25

Speaker(s):彭洋(北京大学)

Time:2024-04-25 16:00-17:00

Venue:腾讯会议 446-9399-1513

摘要:
Distributional reinforcement learning (DRL) has achieved empirical success in various domains. One of the core tasks in the field of DRL is distributional policy evaluation, which involves estimating the return distribution for a given policy π.  Empirical distributional dynamic programming (EDDP) and distributional temporal difference (DTD) algorithms have been accordingly proposed, which are extensions of empirical dynamic programming and temporal difference algorithms in the classic RL literature respectively.

In this talk, we will provide non-asymptotic analysis for EDDP and DTD. The sample complexity bounds are minimax optimal (up to logarithmic factors) in the case of the 1-Wasserstein distance. These theoretical results also indicate that distributional policy evaluation does not require more samples than policy evaluation in classic RL.

论坛简介:该线上论坛是由张志华教授机器学习实验室组织,每两周主办一次(除了公共假期)。论坛每次邀请一位博士生就某个前沿课题做较为系统深入的介绍,主题包括但不限于机器学习、高维统计学、运筹优化和理论计算机科学。