Large language models (LLMs) have enabled advances in language understanding and multimodal tasks, yet scaling reinforcement learning (RL) with these models remains challenging: most existing frameworks either lack the abstractions needed to define and manage complex dataflows or cannot handle models with billions of parameters.
verl (https://github.com/volcengine/verl) is an open-source framework for building end-to-end RL pipelines with LLMs. It provides high-level abstractions and optimizations for dataflow orchestration and resource management via a Ray-based hybrid-controller model. It offers high-level abstractions for dataflow orchestration and resource management: the entire RL dataflow runs as a single controller process on the Ray driver, issuing primitive API calls to WorkerGroup modules. The WorkerGroup and ResourcePool components distribute computation and resources across GPU clusters, delivering high throughput and strong extensibility.
Since its release, verl has seen adoption in both academic research and industry production. It integrates with major training backends (FSDP, FSDP2, Megatron-LM) and inference engines (vLLM, SGLang), and supports various RL algorithms (PPO, GRPO, DAPO, etc) with effortless scaling. Recent trends in reasoning models bring new challenges to RL infrastructure, such as efficient tool calling, multi-turn interactions, and capability to scale up to giant MoE models like DeepSeek 671B. To lower the barrier to RL for advanced reasoning and tool calling, we recently improved verl with (1) efficient request level async multi-turn rollout and tool calling, (2) integration with expert parallelism for large scale MoE models, and (3) async system architecture for off-policy / async RL algorithms and flexible device placement.
Hongpeng Guo is a Research Scientist at ByteDance Seed, where he works on developing large-scale post-training and reinforcement-learning infrastructures.
His research interests encompass large-scale machine learning systems in general.
He earned a B.Eng. in Computer Engineering from the University of Hong Kong and a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign.