A TensorFlow implementation of Scalable Distributed Deep-RL
...In this architecture, multiple actor processes interact with their environments in parallel to collect trajectories, which are then asynchronously sent to a centralized learner for policy updates. The learner uses importance weighting to correct for policy lag between actors and the learner, enabling stable off-policy training at scale. This design allows the system to scale efficiently to hundreds of environments and billions of frames while maintaining sample efficiency and stability. The implementation supports training in DeepMind Lab (DMLab) and has also been adapted for other environments like Atari and Street View.