verl
Volcano Engine Reinforcement Learning for LLMs
...Data pipelines treat human feedback, simulated environments, and synthetic preferences as interchangeable sources, which helps with rapid experimentation. VERL is meant for both research and production hardening: logging, checkpointing, and evaluation suites are built in so you can track learning dynamics and regressions over time.