Language Model Reinforcement Learning Environments frameworks
...This framework facilitates experimentation with RLHF (Reinforcement Learning from Human Feedback), RLAIF, or multi-turn training approaches by abstracting environment logic, scoring, and logging into reusable components.