
机器学习与数据科学博士生系列论坛(第八十四期)—— Beyond Value Functions, Optimizing General Functionals of MDP with Distributional RL
Speaker(s):彭洋(北京大学)
Time:2025-03-06 16:00-17:00
Venue:腾讯会议 531-8098-3912
摘要:
Traditional reinforcement learning (RL) focuses on maximizing the expectation of the return (i.e. value function) in Markov decision processes. However, in some fields such as healthcare, finance, and robotics, a poor return may lead to catastrophic outcomes. Therefore, simply considering the expectation of the return is insufficient. We need to incorporate risk into the decision-making process, which motivates the problem of optimizing general functionals of the return, such as the Conditional Value at Risk (CVaR). Distributional RL provides a natural framework for solving this problem by modeling the full distribution of the return.
In this talk, we will introduce the recent work in this field. We will discuss what functionals can be optimized and present the distributional dynamic programming algorithms with theoretical guarantees. Extensive experiments demonstrate that the algorithms can achieve risk-aware decision-making and outperform existing methods.
论坛简介:该线上论坛是由张志华教授机器学习实验室组织,每两周主办一次(除了公共假期)。论坛每次邀请一位博士生就某个前沿课题做较为系统深入的介绍,主题包括但不限于机器学习、高维统计学、运筹优化和理论计算机科学。