Open Source Python Reinforcement Learning Frameworks

Python Reinforcement Learning Frameworks

View 28 business solutions

Browse free open source Python Reinforcement Learning Frameworks and projects below. Use the toggles on the left to filter open source Python Reinforcement Learning Frameworks by OS, license, language, programming language, and project status.

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    DeepSeek-V3

    DeepSeek-V3

    Powerful AI language model (MoE) optimized for efficiency/performance

    DeepSeek-V3 is a robust Mixture-of-Experts (MoE) language model developed by DeepSeek, featuring a total of 671 billion parameters, with 37 billion activated per token. It employs Multi-head Latent Attention (MLA) and the DeepSeekMoE architecture to enhance computational efficiency. The model introduces an auxiliary-loss-free load balancing strategy and a multi-token prediction training objective to boost performance. Trained on 14.8 trillion diverse, high-quality tokens, DeepSeek-V3 underwent supervised fine-tuning and reinforcement learning to fully realize its capabilities. Evaluations indicate that it outperforms other open-source models and rivals leading closed-source models, achieving this with a training duration of 55 days on 2,048 Nvidia H800 GPUs, costing approximately $5.58 million.
    Downloads: 112 This Week
    Last Update:
    See Project
  • 2
    DeepSeek R1

    DeepSeek R1

    Open-source, high-performance AI model with advanced reasoning

    DeepSeek-R1 is an open-source large language model developed by DeepSeek, designed to excel in complex reasoning tasks across domains such as mathematics, coding, and language. DeepSeek R1 offers unrestricted access for both commercial and academic use. The model employs a Mixture of Experts (MoE) architecture, comprising 671 billion total parameters with 37 billion active parameters per token, and supports a context length of up to 128,000 tokens. DeepSeek-R1's training regimen uniquely integrates large-scale reinforcement learning (RL) without relying on supervised fine-tuning, enabling the model to develop advanced reasoning capabilities. This approach has resulted in performance comparable to leading models like OpenAI's o1, while maintaining cost-efficiency. To further support the research community, DeepSeek has released distilled versions of the model based on architectures such as LLaMA and Qwen.
    Downloads: 88 This Week
    Last Update:
    See Project
  • 3
    Tensorforce

    Tensorforce

    A TensorFlow library for applied reinforcement learning

    Tensorforce is an open-source deep reinforcement learning framework built on TensorFlow, emphasizing modularized design and straightforward usability for applied research and practice.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 4
    Machine Learning PyTorch Scikit-Learn

    Machine Learning PyTorch Scikit-Learn

    Code Repository for Machine Learning with PyTorch and Scikit-Learn

    Initially, this project started as the 4th edition of Python Machine Learning. However, after putting so much passion and hard work into the changes and new topics, we thought it deserved a new title. So, what’s new? There are many contents and additions, including the switch from TensorFlow to PyTorch, new chapters on graph neural networks and transformers, a new section on gradient boosting, and many more that I will detail in a separate blog post. For those who are interested in knowing what this book covers in general, I’d describe it as a comprehensive resource on the fundamental concepts of machine learning and deep learning. The first half of the book introduces readers to machine learning using scikit-learn, the defacto approach for working with tabular datasets. Then, the second half of this book focuses on deep learning, including applications to natural language processing and computer vision.
    Downloads: 8 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    ManiSkill

    ManiSkill

    SAPIEN Manipulation Skill Framework

    ManiSkill is a benchmark platform for training and evaluating reinforcement learning agents on dexterous manipulation tasks using physics-based simulations. Developed by Hao Su Lab, it focuses on robotic manipulation with diverse, high-quality 3D tasks designed to challenge perception, control, and planning in robotics. ManiSkill provides both low-level control and visual observation spaces for realistic learning scenarios.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    Agent S

    Agent S

    Agent S: an open agentic framework that uses computers like a human

    Agent S is an open-source agentic framework designed to enable autonomous computer use through an Agent-Computer Interface (ACI). Built to operate graphical user interfaces like a human, it allows AI agents to perceive screens, reason about tasks, and execute actions across macOS, Windows, and Linux systems. The latest version, Agent S3, surpasses human-level performance on the OSWorld benchmark, demonstrating state-of-the-art results in complex multi-step computer tasks. Agent S combines powerful foundation models (such as GPT-5) with grounding models like UI-TARS to translate visual inputs into precise executable actions. It supports flexible deployment via CLI, SDK, or cloud, and integrates with multiple model providers including OpenAI, Anthropic, Gemini, Azure, and Hugging Face endpoints. With optional local code execution, reflection mechanisms, and compositional planning, Agent S provides a scalable and research-driven framework for building advanced computer-use agents.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    AgentUniverse

    AgentUniverse

    agentUniverse is a LLM multi-agent framework

    AgentUniverse is a multi-agent AI framework that enables coordination between multiple intelligent agents for complex task execution and automation.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Gymnasium

    Gymnasium

    An API standard for single-agent reinforcement learning environments

    Gymnasium is a fork of OpenAI Gym, maintained by the Farama Foundation, that provides a standardized API for reinforcement learning environments. It improves upon Gym with better support, maintenance, and additional features while maintaining backward compatibility.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    H2O LLM Studio

    H2O LLM Studio

    Framework and no-code GUI for fine-tuning LLMs

    Welcome to H2O LLM Studio, a framework and no-code GUI designed for fine-tuning state-of-the-art large language models (LLMs). You can also use H2O LLM Studio with the command line interface (CLI) and specify the configuration file that contains all the experiment parameters. To finetune using H2O LLM Studio with CLI, activate the pipenv environment by running make shell. With H2O LLM Studio, training your large language model is easy and intuitive. First, upload your dataset and then start training your model. Start by creating an experiment. You can then monitor and manage your experiment, compare experiments, or push the model to Hugging Face to share it with the community.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 10
    LightZero

    LightZero

    [NeurIPS 2023 Spotlight] LightZero

    LightZero is an efficient, scalable, and open-source framework implementing MuZero, a powerful model-based reinforcement learning algorithm that learns to predict rewards and transitions without explicit environment models. Developed by OpenDILab, LightZero focuses on providing a highly optimized and user-friendly platform for both academic research and industrial applications of MuZero and similar algorithms.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    PettingZoo

    PettingZoo

    An API standard for multi-agent reinforcement learning environments

    PettingZoo is a standardized API and library for multi-agent reinforcement learning (MARL) environments. It provides a broad set of environments and tools to facilitate the development and evaluation of multi-agent algorithms.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Brax

    Brax

    Massively parallel rigidbody physics simulation

    Brax is a fast and fully differentiable physics engine for large-scale rigid body simulations, built on JAX. It is designed for research in reinforcement learning and robotics, enabling efficient simulations and gradient-based optimization.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    CCZero (中国象棋Zero)

    CCZero (中国象棋Zero)

    Implement AlphaZero/AlphaGo Zero methods on Chinese chess

    ChineseChess-AlphaZero is a project that implements the AlphaZero algorithm for the game of Chinese Chess (Xiangqi). It adapts DeepMind’s AlphaZero method—combining neural networks and Monte Carlo Tree Search (MCTS)—to learn and play Chinese Chess without prior human data. The system includes self-play, training, and evaluation pipelines tailored to Xiangqi's unique game mechanics.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Deep Learning Drizzle

    Deep Learning Drizzle

    Drench yourself in Deep Learning, Reinforcement Learning

    Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures! Optimization courses which form the foundation for ML, DL, RL. Computer Vision courses which are DL & ML heavy. Speech recognition courses which are DL heavy. Structured Courses on Geometric, Graph Neural Networks. Section on Autonomous Vehicles. Section on Computer Graphics with ML/DL focus.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    DouZero

    DouZero

    [ICML 2021] DouZero: Mastering DouDizhu

    DouZero is a reinforcement learning-based AI for playing DouDizhu, a popular Chinese card game. It focuses on perfecting AI strategies for competitive play using value-based deep RL techniques.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    ElegantRL

    ElegantRL

    Massively Parallel Deep Reinforcement Learning

    ElegantRL is an efficient and flexible deep reinforcement learning framework designed for researchers and practitioners. It focuses on simplicity, high performance, and supporting advanced RL algorithms.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Hands-on Unsupervised Learning

    Hands-on Unsupervised Learning

    Code for Hands-on Unsupervised Learning Using Python (O'Reilly Media)

    This repo contains the code for the O'Reilly Media, Inc. book "Hands-on Unsupervised Learning Using Python: How to Build Applied Machine Learning Solutions from Unlabeled Data" by Ankur A. Patel. Many industry experts consider unsupervised learning the next frontier in artificial intelligence, one that may hold the key to the holy grail in AI research, the so-called general artificial intelligence. Since the majority of the world's data is unlabeled, conventional supervised learning cannot be applied; this is where unsupervised learning comes in. Unsupervised learning can be applied to unlabeled datasets to discover meaningful patterns buried deep in the data, patterns that may be near impossible for humans to uncover. Author Ankur Patel provides practical knowledge on how to apply unsupervised learning using two simple, production-ready Python frameworks - scikit-learn and TensorFlow. With the hands-on examples and code provided, you will identify difficult-to-find patterns in data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Multi-Agent Orchestrator

    Multi-Agent Orchestrator

    Flexible and powerful framework for managing multiple AI agents

    Multi-Agent Orchestrator is an AI coordination framework that enables multiple intelligent agents to work together to complete complex, multi-step workflows.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Project Malmo

    Project Malmo

    A platform for Artificial Intelligence experimentation on Minecraft

    How can we develop artificial intelligence that learns to make sense of complex environments? That learns from others, including humans, how to interact with the world? That learns transferable skills throughout its existence, and applies them to solve new, challenging problems? Project Malmo sets out to address these core research challenges, addressing them by integrating (deep) reinforcement learning, cognitive science, and many ideas from artificial intelligence. The Malmo platform is a sophisticated AI experimentation platform built on top of Minecraft, and designed to support fundamental research in artificial intelligence. The Project Malmo platform consists of a mod for the Java version, and code that helps artificial intelligence agents sense and act within the Minecraft environment. The two components can run on Windows, Linux, or Mac OS, and researchers can program their agents in any programming language they’re comfortable with.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    TextWorld

    TextWorld

    ​TextWorld is a sandbox learning environment for the training

    TextWorld is a learning environment designed to train reinforcement learning agents to play text-based games, where actions and observations are entirely in natural language. Developed by Microsoft Research, TextWorld focuses on language understanding, planning, and interaction in complex, narrative-driven environments. It generates games procedurally, enabling scalable testing of agents’ natural language processing and decision-making abilities.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Trax

    Trax

    Deep learning with clear code and speed

    Trax is an end-to-end library for deep learning that focuses on clear code and speed. It is actively used and maintained in the Google Brain team. Run a pre-trained Transformer, create a translator in a few lines of code. Features and resources, API docs, where to talk to us, how to open an issue and more. Walkthrough, how Trax works, how to make new models and train on your own data. Trax includes basic models (like ResNet, LSTM, Transformer) and RL algorithms (like REINFORCE, A2C, PPO). It is also actively used for research and includes new models like the Reformer and new RL algorithms like AWR. Trax has bindings to a large number of deep learning datasets, including Tensor2Tensor and TensorFlow datasets. You can use Trax either as a library from your own python scripts and notebooks or as a binary from the shell, which can be more convenient for training large models. It runs without any changes on CPUs, GPUs and TPUs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Weights and Biases

    Weights and Biases

    Tool for visualizing and tracking your machine learning experiments

    Use W&B to build better models faster. Track and visualize all the pieces of your machine learning pipeline, from datasets to production models. Quickly identify model regressions. Use W&B to visualize results in real time, all in a central dashboard. Focus on the interesting ML. Spend less time manually tracking results in spreadsheets and text files. Capture dataset versions with W&B Artifacts to identify how changing data affects your resulting models. Reproduce any model, with saved code, hyperparameters, launch commands, input data, and resulting model weights. Set wandb.config once at the beginning of your script to save your hyperparameters, input settings (like dataset name or model type), and any other independent variables for your experiments. This is useful for analyzing your experiments and reproducing your work in the future. Setting configs also allows you to visualize the relationships between features of your model architecture or data pipeline and model performance.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    verl

    verl

    Volcano Engine Reinforcement Learning for LLMs

    VERL is a reinforcement-learning–oriented toolkit designed to train and align modern AI systems, from language models to decision-making agents. It brings together supervised fine-tuning, preference modeling, and online RL into one coherent training stack so teams can move from raw data to aligned policies with minimal glue code. The library focuses on scalability and efficiency, offering distributed training loops, mixed precision, and replay/buffering utilities that keep accelerators busy. It ships with reference implementations of popular alignment algorithms and clear examples that make it straightforward to reproduce baselines before customizing. Data pipelines treat human feedback, simulated environments, and synthetic preferences as interchangeable sources, which helps with rapid experimentation. VERL is meant for both research and production hardening: logging, checkpointing, and evaluation suites are built in so you can track learning dynamics and regressions over time.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Acme

    Acme

    A library of reinforcement learning components and agents

    Acme is a framework from DeepMind for building scalable and reproducible reinforcement learning agents. It emphasizes modular components, distributed training, and ease of experimentation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Alibi Explain

    Alibi Explain

    Algorithms for explaining machine learning models

    Alibi is a Python library aimed at machine learning model inspection and interpretation. The focus of the library is to provide high-quality implementations of black-box, white-box, local and global explanation methods for classification and regression models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
MongoDB Logo MongoDB