verl-agent is an open-source reinforcement learning framework designed to train large language model agents and vision-language model agents for complex interactive environments. Built as an extension of the veRL reinforcement learning infrastructure, the project focuses on enabling scalable training for agents that perform multi-step reasoning and decision-making tasks. The framework supports multi-turn interactions between agents and their environments, allowing the system to receive feedback after each step and adjust its strategy accordingly. This step-wise interaction model makes it possible to train agents to operate in long-horizon scenarios where decisions depend on cumulative context and previous outcomes. Developers can configure memory modules that determine how historical information is stored and incorporated into each step of the reasoning process.
Features
- Multi-step interaction loops between agents and environments
- Reinforcement learning training pipeline for LLM and VLM agents
- Customizable memory modules controlling historical context usage
- Flexible input structure combining observations and summarized history
- Support for long-horizon reasoning and decision-making tasks
- Compatibility with the veRL reinforcement learning infrastructure