Download Latest Version TorchRL v0.12.0 source code.tar.gz (9.5 MB)
Email in envelope

Get an email when there's a new version of TorchRL

Home / v0.12.0
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2026-04-27 15.7 kB
TorchRL v0.12.0 source code.tar.gz 2026-04-27 9.5 MB
TorchRL v0.12.0 source code.zip 2026-04-27 10.3 MB
Totals: 3 Items   19.8 MB 0

TorchRL v0.12.0 Release Notes

Highlights

  • New algorithms. Five new config-based trainers — DQN, DDPG, IQL, CQL, and TD3 — are built on a new configuration system for reproducible algorithm setups (@vmoens, @bsprenger). PILCO (Probabilistic Inference for Learning Control) is now available as a built-in algorithm (@PSXBRosa, @vmoens). For diffusion-based behavioral cloning, a new DDPMModule diffusion actor and DiffusionBCLoss are included (@theap06). Async PPO infrastructure overlaps data collection and optimization (@vmoens).

  • Collector and data-flow improvements. A new high-throughput auto-batching inference server automatically batches requests from multiple environments, with pluggable transport backends (threading, multiprocessing, Ray, Monarch) and built-in weight-sync integration. Paired with the new AsyncBatchedCollector, it enables asynchronous data collection with automatic batching for maximum GPU utilization (@vmoens). The new TrajectoryBatcher and AsyncTrajectoryBatcher assemble trajectories efficiently from streaming environment transitions, including variable-length trajectories and padding (@theap06). On the parallel environment side, shared-memory done flags replace mp.Event for lower-latency step synchronization, and a fast-path device-transfer optimization reduces overhead in step_and_maybe_reset (@vmoens).

  • Inference backends. This release adds full SGLang integration alongside vLLM, with an SGLangWrapper policy module, an AsyncSGLang server-based inference path, NCCL weight synchronization, and GRPO support (@vmoens).

  • Replay buffer. StoreStorage is a new Redis/Dragonfly-backed storage backend that lets replay buffers share experience across processes and nodes (@vmoens).

  • Evaluation. A new Evaluator class provides a unified API for synchronous and asynchronous policy evaluation during training, with a process backend, collector-based stepping, weight sync via WeightSyncScheme, multi-model support, and a RayEvalWorker for distributed evaluation (@vmoens).

  • Environments and platform support. A new GenesisEnv wrapper integrates the Genesis physics simulator (@ParamThakkar123). Dreamer now supports pre-vectorized environments and ships with an IsaacLab environment factory, training script, and integration guide (@vmoens). MPS support improves through float64-to-float32 downcasting in ParallelEnv, SerialEnv, and collectors, fixing previously broken Apple Silicon GPU workflows (@bsprenger).

Installation

:::bash
pip install torchrl==0.12.0

Requires PyTorch >= 2.1 and TensorDict >= 0.12.0.


Breaking Changes

  • Remove v0.12 deprecated APIs (#3670) @vmoens
  • The local_init_rb parameter has been removed from Collector and MultiCollector. Storage-level initialization is now the only behavior.
  • TransformedEnv(env=...) now raises TypeError. Use TransformedEnv(base_env=...) instead.

New Features

Auto-batching Inference Server

A new inference server that automatically batches requests from multiple environments for efficient GPU inference. This is a key building block for scaling RL training with many parallel environments.

  • Core server and transport protocol (#3492)
  • Threading transport (#3493)
  • Multiprocessing transport (#3494)
  • Ray transport (#3495)
  • Monarch transport (#3496)
  • Weight sync integration (#3497)

AsyncBatchedCollector

A new collector that combines async environments with the auto-batching inference server for maximum throughput.

  • Async envs + auto-batching inference (#3498)
  • Coordinator loop and direct submission mode (#3499)
  • Backend params and performance optimizations (#3511)

Trajectory Batcher

  • TrajectoryBatcher for assembling trajectories from streaming transitions (#3584) @theap06
  • AsyncTrajectoryBatcher for asynchronous trajectory assembly (#3592) @theap06

SGLang Backend

Full SGLang support for LLM inference, mirroring the existing vLLM integration:

  • Base infrastructure (#3428)
  • AsyncSGLang server-based inference service (#3429)
  • SGLangWrapper policy module (#3430)
  • NCCL weight synchronization (#3431)
  • Module structure integration (#3432)
  • SGLang backend support in GRPO

Diffusion Policies

  • DDPMModule diffusion actor for denoising diffusion probabilistic models (#3596) @theap06
  • DiffusionBCLoss for diffusion-based behavioral cloning (#3604) @theap06

Evaluator

  • Evaluator class for sync/async evaluation (#3594)
  • Process backend, lazy init, and pending property (#3611)
  • Collector-based stepping backend (#3624)
  • Enable loggers to run as Ray actors (#3623)
  • Weight sync via WeightSyncScheme + multi-model support (#3627)
  • Isaac Lab Evaluator tests + init_fn plumbing for process backend (#3663)
  • RayEvalWorker for distributed async evaluation (#3474)
  • Named actors and from_name for RayEvalWorker (#3488)

Async PPO

  • Async PPO infrastructure for overlapping collection and optimization (#3661)

Config-based Trainers

New trainers with integrated configuration system:

  • DQN Trainer (#3526)
  • DDPG Trainer (#3527)
  • IQL Trainer (#3528)
  • CQL Trainer (#3529)
  • TD3 Trainer (#3557) @bsprenger
  • Hook point to log average optimization losses in trainers (#3666)

Replay Buffer

  • StoreStorage for Redis/Dragonfly-backed replay buffers (#3516)
  • set_at_, set_, update_ methods on ReplayBuffer (#3590) @jashshah999
  • Support trajs_per_batch with replay_buffer on multi-process and distributed collectors (#3618)

LLM / GRPO

  • Token-in, token-out LLM wrapper mode (#3407)
  • GRPO improvements: new envs, vLLM V1 compat, log-prob fixes, training stability (#3580)
  • Namespace GRPO wandb metrics for auto-grouping (#3585)
  • Remove placement-group xfails and fix vLLM tokenizer compat (#3586)

Environments

  • GenesisEnv: wrapper for the Genesis physics simulator (#3536) @ParamThakkar123
  • FinancialRegimeEnv: a vectorized financial environment (#3384) @aneesh223
  • num_workers parameter for HabitatEnv (#3383) @ParamThakkar123
  • Dreamer: support pre-vectorized environments (#3483)
  • Dreamer: add IsaacLab environment factory (#3484)

Transforms

  • Inverse for VecNorm and VecNormV2 transforms (#3416) @ParamThakkar123
  • prevent_leaking_rng utility (#3401) @ParamThakkar123

Logging

  • log_metrics method for efficient batch logging (#3452)
  • TensorDict support in log_metrics (#3455)

Specs

  • index_select support for TensorSpec (#3406) @ParamThakkar123
  • strict_shape parameter for QValueModule action shape enforcement (#3593) @Lidang-Jiang

Algorithms

  • PILCO (Probabilistic Inference for Learning Control) (#3582) @PSXBRosa, @vmoens

Collectors

  • Lazy-init RandomPolicy action_spec from env in collectors (#3664)

Other

  • __getattr__ in _dispatch_caller_parallel for transparent attribute access (#3389) @ParamThakkar123
  • scalar_output_mode for loss modules with reduction='none' (#3426)
  • ObsDecoder: out_channels parameter for grayscale decoding (#3472)
  • Ergonomic scalar assignment for loss buffers (#3612)
  • New memmap value for the CKPT_BACKEND environment variable (#3619) @theap06

Performance Improvements

  • GPU Image Transforms for Dreamer (~5.5x faster sampling)
  • SliceSampler: GPU-accelerated trajectory computation
  • Always enable prefetch for replay buffer
  • ParallelEnv: fast-path device transfer in step_and_maybe_reset
  • ParallelEnv: replace mp.Event with shared-memory done flags for lower latency
  • Lazy stack optimization for collector-to-buffer writes (#3438)
  • log_metrics usage in sota-implementations (#3454)

Bug Fixes

MPS (Apple Silicon)

  • Downcast float64 to float32 in ParallelEnv/SerialEnv on MPS (#3551) @bsprenger
  • MPS float64->float32 downcast for tensors (#3548) @bsprenger
  • Fix masked_scatter shape preservation on MPS in collectors (#3473)

Collectors

  • Fix stale model reference in MultiCollector weight sync after device-cast (#3587)
  • Fix shared mem updater with many policies (#3442)
  • Fix missing raise, incorrect __torch_function__ return, and off-by-one in RayCollector (#3530) @jashshah999

Environments

  • Fix check_env_specs() when state_spec contains keys not in observation_spec (#3581) @theap06
  • Fix BraxEnv rejecting camera_id and render_kwargs (#3533)
  • StepCounter now tracks nested truncated and done states (#3405)
  • Fix ParallelEnv shutdown hang with shared-memory done flags (#3464)

Loss / Models

  • Fix broken SACLoss when there is more than one qvalue_network (#3500) @ParamThakkar123
  • Fix GPT2RewardModel.compute_reward_loss (#3521)
  • Fix resnet order call in ImpalaNet (#3522)
  • Fix vLLM CompilationConfig compatibility and Windows CI pybind11 (#3673)

Specs / TensorDict

  • Fix MultiOneHot.to_numpy() returning scalar instead of array (#3589) @jashshah999
  • Fix shape mismatch in _set_index_in_td with trailing dims of 1 (#3517)
  • Set batch size in Composite.encode (#3411) @tobiabir

Transforms

  • LineariseRewards should not squeeze trailing dim (#3614) @mathieuorhan
  • Allow gradient flow through R3MTransform in training mode (#3607) @theap06

Other

  • Fix DataLoadingPrimer batch_size detection for NonTensorStack (#3532)
  • Fix compiled storage access (#3547)
  • Fix CUDA graph capture for Bounded spec projection (#3453)
  • Fix VideoRecorder support for grayscale (1-channel) observations (#3471)
  • Fix functools.partial warnings (#3465) @ParamThakkar123
  • Fix none handling in pendulum.py tutorial (#3595) @theap06
  • Fix StepCounter._reset should not use output_spec (#3626)
  • Fix per-group WandB step logging (#3625)

Refactors

  • Refactor optimization API for multi-phase optimization (#3468) @bsprenger
  • Upgrade to torchcodec for video export (#3540)
  • Refactor NoisyLinear (#3082)
  • Upgrade meshgrid usage to address deprecation warning (#3412)

Documentation

  • Use new canonical collector names across docs, tutorials, and SOTA (#3665)
  • Tutorial on collector trajectory assembly internals (#3600) @coder-jayp
  • IsaacLab integration guide and setup script (#3486)
  • RayEvalWorker API reference docs (#3487)
  • SGLang backend documentation
  • TransformersWrapper/ChatEnv integration documentation (#3377)
  • EGL multi-GPU limitations in containers (#3456)

New Contributors

Welcome to the following first-time contributors!

  • @aneesh223 - FinancialRegimeEnv
  • @coder-jayp - Collector trajectory assembly tutorial
  • @jashshah999 - RayCollector fixes, ReplayBuffer API additions, MultiOneHot fix
  • @Lidang-Jiang - QValueModule strict_shape parameter
  • @mathieuorhan - LineariseRewards fix
  • @theap06 - Diffusion policies, trajectory batcher, R3MTransform fix, CKPT_BACKEND memmap
  • @thecaptain789 - Typo fix
  • @tobiabir - Composite.encode batch size fix

Full Changelog

https://github.com/pytorch/rl/compare/v0.11.0...v0.12.0

Source: README.md, updated 2026-04-27