Stable Baselines - Browse /v2.7.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2019-07-31	1.5 kB	0
Twin Delayed DDPG (TD3) and GAE bug fix (TRPO, PPO1, GAIL) source code.tar.gz	2019-07-31	2.2 MB	0
Twin Delayed DDPG (TD3) and GAE bug fix (TRPO, PPO1, GAIL) source code.zip	2019-07-31	2.3 MB	0
Totals: 3 Items		4.5 MB	0

New Features

added Twin Delayed DDPG (TD3) algorithm, with HER support
added support for continuous action spaces to action_probability, computing the PDF of a Gaussian policy in addition to the existing support for categorical stochastic policies.
added flag to action_probability to return log-probabilities.
added support for python lists and numpy arrays in logger.writekvs. (@dwiel)
the info dict returned by VecEnvs now include a terminal_observation key providing access to the last observation in a trajectory. (@qxcv)

fixed a bug in traj_segment_generator where the episode_starts was wrongly recorded, resulting in wrong calculation of Generalized Advantage Estimation (GAE), this affects TRPO, PPO1 and GAIL (thanks to @miguelrass for spotting the bug)
added missing property n_batch in BasePolicy.

renamed some keys in traj_segment_generator to be more meaningful
retrieve unnormalized reward when using Monitor wrapper with TRPO, PPO1 and GAIL to display them in the logs (mean episode reward)
clean up DDPG code (renamed variables)

doc fix for the hyperparameter tuning command in the rl zoo
added an example on how to log additional variable with tensorboard and a callback

Source: README.md, updated 2019-07-31