| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2026-04-27 | 15.7 kB | |
| TorchRL v0.12.0 source code.tar.gz | 2026-04-27 | 9.5 MB | |
| TorchRL v0.12.0 source code.zip | 2026-04-27 | 10.3 MB | |
| Totals: 3 Items | 19.8 MB | 0 | |
TorchRL v0.12.0 Release Notes
Highlights
-
New algorithms. Five new config-based trainers — DQN, DDPG, IQL, CQL, and TD3 — are built on a new configuration system for reproducible algorithm setups (@vmoens, @bsprenger). PILCO (Probabilistic Inference for Learning Control) is now available as a built-in algorithm (@PSXBRosa, @vmoens). For diffusion-based behavioral cloning, a new
DDPMModulediffusion actor andDiffusionBCLossare included (@theap06). Async PPO infrastructure overlaps data collection and optimization (@vmoens). -
Collector and data-flow improvements. A new high-throughput auto-batching inference server automatically batches requests from multiple environments, with pluggable transport backends (threading, multiprocessing, Ray, Monarch) and built-in weight-sync integration. Paired with the new
AsyncBatchedCollector, it enables asynchronous data collection with automatic batching for maximum GPU utilization (@vmoens). The newTrajectoryBatcherandAsyncTrajectoryBatcherassemble trajectories efficiently from streaming environment transitions, including variable-length trajectories and padding (@theap06). On the parallel environment side, shared-memory done flags replacemp.Eventfor lower-latency step synchronization, and a fast-path device-transfer optimization reduces overhead instep_and_maybe_reset(@vmoens). -
Inference backends. This release adds full SGLang integration alongside vLLM, with an
SGLangWrapperpolicy module, anAsyncSGLangserver-based inference path, NCCL weight synchronization, and GRPO support (@vmoens). -
Replay buffer.
StoreStorageis a new Redis/Dragonfly-backed storage backend that lets replay buffers share experience across processes and nodes (@vmoens). -
Evaluation. A new
Evaluatorclass provides a unified API for synchronous and asynchronous policy evaluation during training, with a process backend, collector-based stepping, weight sync viaWeightSyncScheme, multi-model support, and aRayEvalWorkerfor distributed evaluation (@vmoens). -
Environments and platform support. A new
GenesisEnvwrapper integrates the Genesis physics simulator (@ParamThakkar123). Dreamer now supports pre-vectorized environments and ships with an IsaacLab environment factory, training script, and integration guide (@vmoens). MPS support improves through float64-to-float32 downcasting inParallelEnv,SerialEnv, and collectors, fixing previously broken Apple Silicon GPU workflows (@bsprenger).
Installation
:::bash
pip install torchrl==0.12.0
Requires PyTorch >= 2.1 and TensorDict >= 0.12.0.
Breaking Changes
- Remove v0.12 deprecated APIs (#3670) @vmoens
- The
local_init_rbparameter has been removed fromCollectorandMultiCollector. Storage-level initialization is now the only behavior. TransformedEnv(env=...)now raisesTypeError. UseTransformedEnv(base_env=...)instead.
New Features
Auto-batching Inference Server
A new inference server that automatically batches requests from multiple environments for efficient GPU inference. This is a key building block for scaling RL training with many parallel environments.
- Core server and transport protocol (#3492)
- Threading transport (#3493)
- Multiprocessing transport (#3494)
- Ray transport (#3495)
- Monarch transport (#3496)
- Weight sync integration (#3497)
AsyncBatchedCollector
A new collector that combines async environments with the auto-batching inference server for maximum throughput.
- Async envs + auto-batching inference (#3498)
- Coordinator loop and direct submission mode (#3499)
- Backend params and performance optimizations (#3511)
Trajectory Batcher
TrajectoryBatcherfor assembling trajectories from streaming transitions (#3584) @theap06AsyncTrajectoryBatcherfor asynchronous trajectory assembly (#3592) @theap06
SGLang Backend
Full SGLang support for LLM inference, mirroring the existing vLLM integration:
- Base infrastructure (#3428)
AsyncSGLangserver-based inference service (#3429)SGLangWrapperpolicy module (#3430)- NCCL weight synchronization (#3431)
- Module structure integration (#3432)
- SGLang backend support in GRPO
Diffusion Policies
DDPMModulediffusion actor for denoising diffusion probabilistic models (#3596) @theap06DiffusionBCLossfor diffusion-based behavioral cloning (#3604) @theap06
Evaluator
Evaluatorclass for sync/async evaluation (#3594)- Process backend, lazy init, and pending property (#3611)
- Collector-based stepping backend (#3624)
- Enable loggers to run as Ray actors (#3623)
- Weight sync via
WeightSyncScheme+ multi-model support (#3627) - Isaac Lab
Evaluatortests +init_fnplumbing for process backend (#3663) RayEvalWorkerfor distributed async evaluation (#3474)- Named actors and
from_nameforRayEvalWorker(#3488)
Async PPO
- Async PPO infrastructure for overlapping collection and optimization (#3661)
Config-based Trainers
New trainers with integrated configuration system:
- DQN Trainer (#3526)
- DDPG Trainer (#3527)
- IQL Trainer (#3528)
- CQL Trainer (#3529)
- TD3 Trainer (#3557) @bsprenger
- Hook point to log average optimization losses in trainers (#3666)
Replay Buffer
StoreStoragefor Redis/Dragonfly-backed replay buffers (#3516)set_at_,set_,update_methods onReplayBuffer(#3590) @jashshah999- Support
trajs_per_batchwithreplay_bufferon multi-process and distributed collectors (#3618)
LLM / GRPO
- Token-in, token-out LLM wrapper mode (#3407)
- GRPO improvements: new envs, vLLM V1 compat, log-prob fixes, training stability (#3580)
- Namespace GRPO wandb metrics for auto-grouping (#3585)
- Remove placement-group xfails and fix vLLM tokenizer compat (#3586)
Environments
GenesisEnv: wrapper for the Genesis physics simulator (#3536) @ParamThakkar123FinancialRegimeEnv: a vectorized financial environment (#3384) @aneesh223num_workersparameter forHabitatEnv(#3383) @ParamThakkar123- Dreamer: support pre-vectorized environments (#3483)
- Dreamer: add IsaacLab environment factory (#3484)
Transforms
- Inverse for
VecNormandVecNormV2transforms (#3416) @ParamThakkar123 prevent_leaking_rngutility (#3401) @ParamThakkar123
Logging
Specs
index_selectsupport forTensorSpec(#3406) @ParamThakkar123strict_shapeparameter forQValueModuleaction shape enforcement (#3593) @Lidang-Jiang
Algorithms
- PILCO (Probabilistic Inference for Learning Control) (#3582) @PSXBRosa, @vmoens
Collectors
- Lazy-init
RandomPolicyaction_specfrom env in collectors (#3664)
Other
__getattr__in_dispatch_caller_parallelfor transparent attribute access (#3389) @ParamThakkar123scalar_output_modefor loss modules withreduction='none'(#3426)ObsDecoder:out_channelsparameter for grayscale decoding (#3472)- Ergonomic scalar assignment for loss buffers (#3612)
- New
memmapvalue for theCKPT_BACKENDenvironment variable (#3619) @theap06
Performance Improvements
- GPU Image Transforms for Dreamer (~5.5x faster sampling)
SliceSampler: GPU-accelerated trajectory computation- Always enable prefetch for replay buffer
ParallelEnv: fast-path device transfer instep_and_maybe_resetParallelEnv: replacemp.Eventwith shared-memory done flags for lower latency- Lazy stack optimization for collector-to-buffer writes (#3438)
log_metricsusage in sota-implementations (#3454)
Bug Fixes
MPS (Apple Silicon)
- Downcast float64 to float32 in
ParallelEnv/SerialEnvon MPS (#3551) @bsprenger - MPS float64->float32 downcast for tensors (#3548) @bsprenger
- Fix
masked_scattershape preservation on MPS in collectors (#3473)
Collectors
- Fix stale model reference in
MultiCollectorweight sync after device-cast (#3587) - Fix shared mem updater with many policies (#3442)
- Fix missing raise, incorrect
__torch_function__return, and off-by-one inRayCollector(#3530) @jashshah999
Environments
- Fix
check_env_specs()whenstate_speccontains keys not inobservation_spec(#3581) @theap06 - Fix
BraxEnvrejectingcamera_idandrender_kwargs(#3533) StepCounternow tracks nested truncated and done states (#3405)- Fix
ParallelEnvshutdown hang with shared-memory done flags (#3464)
Loss / Models
- Fix broken
SACLosswhen there is more than oneqvalue_network(#3500) @ParamThakkar123 - Fix
GPT2RewardModel.compute_reward_loss(#3521) - Fix resnet order call in
ImpalaNet(#3522) - Fix vLLM
CompilationConfigcompatibility and Windows CIpybind11(#3673)
Specs / TensorDict
- Fix
MultiOneHot.to_numpy()returning scalar instead of array (#3589) @jashshah999 - Fix shape mismatch in
_set_index_in_tdwith trailing dims of 1 (#3517) - Set batch size in
Composite.encode(#3411) @tobiabir
Transforms
LineariseRewardsshould not squeeze trailing dim (#3614) @mathieuorhan- Allow gradient flow through
R3MTransformin training mode (#3607) @theap06
Other
- Fix
DataLoadingPrimerbatch_size detection forNonTensorStack(#3532) - Fix compiled storage access (#3547)
- Fix CUDA graph capture for
Boundedspec projection (#3453) - Fix
VideoRecordersupport for grayscale (1-channel) observations (#3471) - Fix
functools.partialwarnings (#3465) @ParamThakkar123 - Fix none handling in
pendulum.pytutorial (#3595) @theap06 - Fix
StepCounter._resetshould not useoutput_spec(#3626) - Fix per-group WandB step logging (#3625)
Refactors
- Refactor optimization API for multi-phase optimization (#3468) @bsprenger
- Upgrade to
torchcodecfor video export (#3540) - Refactor
NoisyLinear(#3082) - Upgrade
meshgridusage to address deprecation warning (#3412)
Documentation
- Use new canonical collector names across docs, tutorials, and SOTA (#3665)
- Tutorial on collector trajectory assembly internals (#3600) @coder-jayp
- IsaacLab integration guide and setup script (#3486)
RayEvalWorkerAPI reference docs (#3487)- SGLang backend documentation
TransformersWrapper/ChatEnvintegration documentation (#3377)- EGL multi-GPU limitations in containers (#3456)
New Contributors
Welcome to the following first-time contributors!
- @aneesh223 -
FinancialRegimeEnv - @coder-jayp - Collector trajectory assembly tutorial
- @jashshah999 -
RayCollectorfixes,ReplayBufferAPI additions,MultiOneHotfix - @Lidang-Jiang -
QValueModulestrict_shapeparameter - @mathieuorhan -
LineariseRewardsfix - @theap06 - Diffusion policies, trajectory batcher,
R3MTransformfix,CKPT_BACKENDmemmap - @thecaptain789 - Typo fix
- @tobiabir -
Composite.encodebatch size fix