Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2019-06-28 | 5.3 kB | |
v0.7.0.tar.gz | 2019-06-28 | 7.3 MB | |
v0.7.0.zip | 2019-06-28 | 7.5 MB | |
Totals: 3 Items | 14.9 MB | 0 |
Important enhancements
- Rainbow (https://arxiv.org/abs/1710.02298) with benchmark results is added. (thanks @seann999!)
- Agent class:
chainerrl.agents.CategoricalDoubleDQN
- Example and benchmark results: https://github.com/chainer/chainerrl/tree/v0.7.0/examples/atari/rainbow
- TD3 (https://arxiv.org/abs/1802.09477) with benchmark results is added.
- Agent class:
chainerrl.agents.TD3
- Example and benchmark results: https://github.com/chainer/chainerrl/tree/v0.7.0/examples/mujoco/td3
- PPO now supports recurrent models.
- Example: https://github.com/chainer/chainerrl/tree/v0.7.0/examples/ale/train_ppo_ale.py (with
--recurrent
option) - Results: https://github.com/chainer/chainerrl/pull/431
- DDPG now supports batch training
- Example: https://github.com/chainer/chainerrl/tree/v0.7.0/examples/gym/train_ddpg_batch_gym.py
Important bugfixes
- The bug that some examples use the same random seed across envs for
env.seed
is fixed. - The bug that batch training with n-step return and/or recurrent models is not successful is fixed.
- The bug that
examples/ale/train_dqn_ale.py
usesLinearDecayEpsilonGreedy
even when NoisyNet is used is fixed. - The bug that
examples/ale/train_dqn_ale.py
does not use the value specified by--noisy-net-sigma
is fixed. - The bug that
chainerrl.links.to_factorized_noisy
does not work correctly withchainerrl.links.Sequence
is fixed.
Important destructive changes
chainerrl.experiments.train_agent_async
now requireseval_n_steps
(number of timesteps for each evaluation phase) andeval_n_episodes
(number of episodes for each evaluation phase) to be explicitly specified, with one of them being None.examples/ale/dqn_phi.py
is removed.chainerrl.initializers.LeCunNormal
is removed. Usechainer.initializers.LeCunNormal
instead.
All updates
Enhancement
- Rainbow (#374)
- Make copy_param support scalar parameters (#410)
- Enables batch DDPG agents to be trained. (#416)
- Enables asynchronous time-based evaluations of agents. (#420)
- Removes obsolete dqn_phi file (#424)
- Add Branched and use it to simplify train_ppo_batch_gym.py (#427)
- Remove LeCunNormal since Chainer has it from v3 (#428)
- Precompute log probability in PPO (#430)
- Recurrent PPO with a stateless recurrent model interface (#431)
- Replace Variable.data with Variable.array (again) (#434)
- Make IQN work with tuple observations (#435)
- Add VectorStackFrame to reduce memory usage in train_dqn_batch_ale.py (#443)
- DDPG example that reproduces the TD3 paper (#452)
- TD3 agent (#453)
- update requirements.txt and setup.py for gym (#461)
- Support
gym>=0.12.2
by stopping to use underscore methods in gym wrappers (#462) - Add warning about numpy 1.16.0 (#476)
Documentation
- Link to abstract pages on ArXiv (#409)
- fixes typo (#412)
- Fixes file path in grasping example README (#422)
- Add links to references (#425)
- Fixes minor grammar mistake in A3C ALE example (#432)
- Add explanation of
examples/atari
(#437) - Link to chainer/chainer, not pfnet/chainer (#439)
- Link to chainer/chainer(rl), not pfnet/chainer(rl) (#440)
- fix & add docstring for FCStateQFunctionWithDiscreteAction (#441)
- Fixes a typo in train_agent_batch Documentation. (#444)
- Adds Rainbow to main README (#447)
- Fixes Docstring in IQN (#451)
- Improves Rainbow README (#458)
- very small fix: add missing doc for eval_performance. (#459)
- Adds IQN Results to readme (#469)
- Adds IQN to the documentation. (#470)
- Adds reference to mujoco folder in the examples README (#474)
- Fixes incorrect comment. (#490)
Examples
- Rainbow (#374)
- Create an IQN example aimed at reproducing the original paper and its evaluation protocol. (#408)
- Benchmarks DQN example (#414)
- Enables batch DDPG agents to be trained. (#416)
- Fixes scores for Demon Attack (#418)
- Set observation_space of kuka env correctly (#421)
- Fixes error in setting explorer in DQN ALE example. (#423)
- Add Branched and use it to simplify train_ppo_batch_gym.py (#427)
- A3C Example for reproducing paper results. (#433)
- PPO example that reproduces the "Deep Reinforcement Learning that Matters" paper (#448)
- DDPG example that reproduces the TD3 paper (#452)
- TD3 agent (#453)
- Apply
noisy_net_sigma
parameter (#465)
Testing
- Use Python 3.6 in Travis CI (#411)
- Increase tolerance of TestGaussianDistribution.test_entropy since sometimes it failed (#438)
- make FrameStack follow original spaces (#445)
- Split test_examples.sh (#472)
- Fix Travis error (#492)
- Use Python 3.6 for ipynb (#493)
Bugfixes
- bugfix (#360, thanks @corochann!)
- Fixes error in setting explorer in DQN ALE example. (#423)
- Make sure the agent sees when episodes end (#429)
- Pass env_id to replay buffer methods to correctly support batch training (#442)
- Add VectorStackFrame to reduce memory usage in train_dqn_batch_ale.py (#443)
- Fix a bug of unintentionally using same process indices (#455)
- Make cv2 dependency optional (#456)
- fix ScaledFloatFrame.observation_space (#460)
- Apply
noisy_net_sigma
parameter (#465) - Match EpisodicReplayBuffer.sample with ReplayBuffer.sample (#485)
- Make
to_factorized_noisy
work with sequential links (#489)