Pearl
A Production-ready Reinforcement Learning AI Agent Library
...It is organized around modular components—policy learners, replay buffers, exploration strategies, safety modules, and history summarizers—that snap together to form reliable agents with clear boundaries and strong defaults. The library implements classic and modern algorithms across two regimes: contextual bandits (e.g., LinUCB, LinTS, SquareCB, neural bandits) and fully sequential RL (e.g., DQN, PPO-style policy optimization), with attention to practical concerns like nonstationarity and dynamic action spaces. Tutorials demonstrate end-to-end workflows on OpenAI Gym tasks and contextual-bandit setups derived from tabular datasets, emphasizing reproducibility and clear baselines. ...