机器学习与数据科学博士生系列论坛(第八十九期)——Statistical Guarantees in Continuous-Time Policy Evaluation

发文时间:2025-05-29

Speaker(s):谢楚焓(北京大学)

Time:2025-05-29 16:00-17:00

Venue:腾讯会议 531-8098-3912

摘要:
Continuous-time reinforcement learning has revived recently due to its practical applications, such as fine-tuning of diffusion models. While traditional continuous-time policy evaluation stems from stochastic control with an exact model-based nature, modern temporal-difference type methods are generally model-free but suffer from discretization errors in the continuous regime. In this talk, we introduce a line of research that integrates these two sides of advantages by a high-order approximation to the Bellman equation or the diffusion generator. Both asymptotic and non-asymptotic guarantees for the least squares temporal difference (LSTD) algorithm are further discussed, shedding light on the trade-off between approximation and statistical errors. 


论坛简介:该线上论坛是由张志华教授机器学习实验室组织,每两周主办一次(除了公共假期)。论坛每次邀请一位博士生就某个前沿课题做较为系统深入的介绍,主题包括但不限于机器学习、高维统计学、运筹优化和理论计算机科学。