Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2019-02-28 | 3.0 kB | |
v0.6.0.tar.gz | 2019-02-28 | 6.9 MB | |
v0.6.0.zip | 2019-02-28 | 7.1 MB | |
Totals: 3 Items | 14.0 MB | 0 |
Important enhancements
- Implicit Quantile Network (IQN) https://arxiv.org/abs/1806.06923 agent is added:
chainerrl.agents.IQN
. - Training DQN and its variants with N-step returns is supported.
- Resetting env with
done=False
viainfo
dict is supported. Whenenv.step
returns ainfo
dict withinfo['needs_reset']=True
, env is reset. This feature is useful for implementing a continuing env. - Evaluation with a fixed number of timesteps is supported (except async training). This evaluation protocol is popular in Atari benchmarks.
examples/atari/dqn
now implements the same evaluation protocol as the Nature DQN paper.- An example script of training a DoubleDQN agent for a PyBullet-based robotic grasping env is added:
examples/grasping
.
Important bugfixes
- The bug that PPO's
obs_normalizer
was not saved is fixed. - The bug that NonbiasWeightDecay didn't work with newer versions of Chainer is fixed.
- The bug that
argv
argument was ignored bychainerrl.experiments.prepare_output_dir
is fixed.
Important destructive changes
train_agent_with_evaluation
andtrain_agent_batch_with_evaluation
now requireeval_n_steps
(number of timesteps for each evaluation phase) andeval_n_episodes
(number of episodes for each evaluation phase) to be explicitly specified, with one of them beingNone
.train_agent_with_evaluation
'smax_episode_len
argument is renamed totrain_max_episode_len
.ReplayBuffer.sample
now returns a list of lists of N experiences to support N-step returns.
All updates
Enhancement
- Implicit quantile networks (IQN) (#288)
- Adds N-step learning for DQN-based agents. (#317)
- Replaywarning (#321)
- Close envs in async training (#343)
- Allow envs to send a 'needs_reset' signal (#356)
- Changes variable names in train_agent_with_evaluation (#358)
- Use chainer.dataset.concat_examples in batch_states (#366)
- Implements Time-based evaluations (#367)
Documentation
- Add long description for pypi (#357, thanks @ljvmiranda921!)
- A small change to the installation documentation (#369)
- Adds a link to the ChainerRL visualizer from the main repository (#370)
- adds implicit quantile networks to readme (#393)
- Fix DQN.update's docstring (#394)
Examples
- Grasping example (#371)
- Adds Deepmind Scores to README in DQN Example (#383)
Testing
- Fix
TestTrainAgentAsync
(#363) - Use AbnormalExitCodeWarning for nonzero exitcode warnings (#378)
- Avoid random test failures due to asynchronousness (#380)
- Drop hacking (#381)
- Avoid gym 0.11.0 in Travis (#396)
- Stabilize and speed up A3C tests (#401)
- Reduce ACER's test cases and maximum timesteps (#404)
- Add tests of IQN examples (#405)
Bugfixes
- Avoid UnicodeDecodeError in setup.py (#365)
- Save and load obs_normalizer of PPO (#377)
- Make NonbiasWeightDecay work again (#390)
- bug fix (#391, thanks @tappy27!)
- Fix episodic training of DDPG (#399)
- Fix PGT's training (#400)
- Fix ResidualDQN's training (#402)