应用数学青年讨论班(午餐会)-- Large Language Model Post-Training: Formulation and Algorithms

发文时间:2025-03-26

Speaker(s):李子牛(香港中文大学(深圳))

Time:2025-03-26 11:45-13:00

Venue:智华楼四元厅

摘要:

Post-training is essential for adapting Large Language Models (LLMs) to specialized downstream tasks. This process typically involves Supervised Fine-Tuning (SFT) for instruction following and Reinforcement Learning (RL) for capability enhancement—approaches central to flagship products like ChatGPT and DeepSeek-R1. In this talk, we present new mathematical formulations for both SFT and RL. We identify several key properties that distinguish LLM post-training from classical supervised learning and standard reinforcement learning frameworks. Based on these insights, we introduce two new training algorithms: GEM for SFT and ReMax for RL. Specifically, GEM preserves output diversity, enhancing exploration in subsequent RL stages, while ReMax significantly reduces the computation complexity and opens a new paradiagm of RL for LLMs. Both algorithms come with theoretical guarantees using tools from optimization theory. This talk is expected to benefit researchers and practitioners interested in LLM implementation and those exploring the theoretical foundations of LLMs.

 

报告人信息: 

李子牛,香港中文大学(深圳)数据科学学院博士生,师从罗智泉(Tom Luo)教授。研究主要集中在机器学习算法设计与理论分析,特别关注大语言模型与强化学习应用。研究成果曾获得NeurIPS 2024 FITML Workshop最佳论文亚军、NeurIPS 2023 Spotlight以及UAI 2023 Oral演讲等荣誉。

 

欢迎大家参与3月26号的午餐会。报告时间是12:00-13:00,午餐于11:45开始提供。请有意参与的老师和同学在3月25日15:00前填写以下问卷 https://www.wjx.cn/vm/QAfmW6G.aspx#