机器学习与数据科学博士生系列论坛（第八十九期）——Statistical Guarantees in Continuous-Time Policy Evaluation

学术活动

机器学习与数据科学博士生系列论坛（第八十九期）——Statistical Guarantees in Continuous-Time Policy Evaluation

发文时间：2025-05-29

Speaker(s)：谢楚焓（北京大学）

Time：2025-05-29 16:00-17:00

Venue：腾讯会议 531-8098-3912

摘要：
Continuous-time reinforcement learning has revived recently due to its practical applications, such as fine-tuning of diffusion models. While traditional continuous-time policy evaluation stems from stochastic control with an exact model-based nature, modern temporal-difference type methods are generally model-free but suffer from discretization errors in the continuous regime. In this talk, we introduce a line of research that integrates these two sides of advantages by a high-order approximation to the Bellman equation or the diffusion generator. Both asymptotic and non-asymptotic guarantees for the least squares temporal difference (LSTD) algorithm are further discussed, shedding light on the trade-off between approximation and statistical errors.

论坛简介：该线上论坛是由张志华教授机器学习实验室组织，每两周主办一次（除了公共假期）。论坛每次邀请一位博士生就某个前沿课题做较为系统深入的介绍，主题包括但不限于机器学习、高维统计学、运筹优化和理论计算机科学。