Download Latest Version Bug fixes release source code.tar.gz (2.4 MB)
Email in envelope

Get an email when there's a new version of Stable Baselines

Home / v2.7.0
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2019-07-31 1.5 kB
Twin Delayed DDPG (TD3) and GAE bug fix (TRPO, PPO1, GAIL) source code.tar.gz 2019-07-31 2.2 MB
Twin Delayed DDPG (TD3) and GAE bug fix (TRPO, PPO1, GAIL) source code.zip 2019-07-31 2.3 MB
Totals: 3 Items   4.5 MB 0

New Features

  • added Twin Delayed DDPG (TD3) algorithm, with HER support
  • added support for continuous action spaces to action_probability, computing the PDF of a Gaussian policy in addition to the existing support for categorical stochastic policies.
  • added flag to action_probability to return log-probabilities.
  • added support for python lists and numpy arrays in logger.writekvs. (@dwiel)
  • the info dict returned by VecEnvs now include a terminal_observation key providing access to the last observation in a trajectory. (@qxcv)

Bug Fixes

  • fixed a bug in traj_segment_generator where the episode_starts was wrongly recorded, resulting in wrong calculation of Generalized Advantage Estimation (GAE), this affects TRPO, PPO1 and GAIL (thanks to @miguelrass for spotting the bug)
  • added missing property n_batch in BasePolicy.

Others

  • renamed some keys in traj_segment_generator to be more meaningful
  • retrieve unnormalized reward when using Monitor wrapper with TRPO, PPO1 and GAIL to display them in the logs (mean episode reward)
  • clean up DDPG code (renamed variables)

Documentation

  • doc fix for the hyperparameter tuning command in the rl zoo
  • added an example on how to log additional variable with tensorboard and a callback
Source: README.md, updated 2019-07-31