机器学习与数据科学博士生系列论坛（第九十期）—— Reinforcement Learning for Large Language Models

学术活动

机器学习与数据科学博士生系列论坛（第九十期）—— Reinforcement Learning for Large Language Models

发文时间：2025-06-12

Speaker(s)：杨潇博（北京大学）

Time：2025-06-12 16:00-17:00

Venue：腾讯会议 531-8098-3912

摘要：
Reinforcement Learning (RL) is increasingly pivotal in refining the capabilities of Large Language Models (LLMs) beyond their foundational training. This talk delves into the application of RL specifically within LLMs. We will begin by briefly covering LLM fundamentals—such as encoding, the Transformer architecture, pre-training, and instruction fine-tuning—which pave the way for RL integration. The core of the presentation will focus on RL modeling in this context, discussing key training scenarios like enhancing specific task abilities, employing Reinforcement Learning from Human Feedback (RLHF) for value alignment, and developing LLM-based agents. We will also explore the critical design of reward functions and introduce prominent RL algorithms, examining how these methods shape LLM behavior and performance.

论坛简介：该线上论坛是由张志华教授机器学习实验室组织，每两周主办一次（除了公共假期）。论坛每次邀请一位博士生就某个前沿课题做较为系统深入的介绍，主题包括但不限于机器学习、高维统计学、运筹优化和理论计算机科学。