Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2019-07-31 | 1.5 kB | |
Twin Delayed DDPG (TD3) and GAE bug fix (TRPO, PPO1, GAIL) source code.tar.gz | 2019-07-31 | 2.2 MB | |
Twin Delayed DDPG (TD3) and GAE bug fix (TRPO, PPO1, GAIL) source code.zip | 2019-07-31 | 2.3 MB | |
Totals: 3 Items | 4.5 MB | 0 |
New Features
- added Twin Delayed DDPG (TD3) algorithm, with HER support
- added support for continuous action spaces to action_probability, computing the PDF of a Gaussian policy in addition to the existing support for categorical stochastic policies.
- added flag to action_probability to return log-probabilities.
- added support for python lists and numpy arrays in
logger.writekvs
. (@dwiel) - the info dict returned by VecEnvs now include a
terminal_observation
key providing access to the last observation in a trajectory. (@qxcv)
Bug Fixes
- fixed a bug in
traj_segment_generator
where theepisode_starts
was wrongly recorded, resulting in wrong calculation of Generalized Advantage Estimation (GAE), this affects TRPO, PPO1 and GAIL (thanks to @miguelrass for spotting the bug) - added missing property n_batch in BasePolicy.
Others
- renamed some keys in
traj_segment_generator
to be more meaningful - retrieve unnormalized reward when using Monitor wrapper with TRPO, PPO1 and GAIL to display them in the logs (mean episode reward)
- clean up DDPG code (renamed variables)
Documentation
- doc fix for the hyperparameter tuning command in the rl zoo
- added an example on how to log additional variable with tensorboard and a callback