Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2018-11-14 | 3.2 kB | |
v0.5.0.tar.gz | 2018-11-14 | 6.2 MB | |
v0.5.0.zip | 2018-11-14 | 6.3 MB | |
Totals: 3 Items | 12.5 MB | 0 |
Important enhancements
- Batch synchronized training using multiple environment instances and a single GPU is supported for some agents:
- A2C (added as
chainerrl.agents.A2C
) - PPO
- DQN and other agents that inherits DQN except SARSA
examples/ale/train_dqn_ale.py
now follows "Tuned DoubleDQN" setting by default, and supports prioritized experience replay as an optionexamples/atari/train_dqn.py
is added as a basic example of applying DQN to Atari.
Important bugfixes
- A bug in
chainerrl.agents.CategoricalDQN
that deteriorates performance is fixed - A bug in
atari_wrappers.LazyFrame
that unnecessarily increases memory usage is fixed
Important destructive changes
chainerrl.replay_buffer.PrioritizedReplayBuffer
andchainerrl.replay_buffer.PrioritizedEpisodicReplayBuffer
are updated:- become FIFO (First In, First Out), reducing memory usage in Atari games
- compute priorities more closely following the paper
eval_explorer
argument ofchainerrl.experiments.train_agent_*
is dropped (usechainerrl.wrappers.RandomizeAction
for evaluation-time epsilon-greedy)- Interface of
chainerrl.agents.PPO
has changed a lot - Support of Chainer v2 is dropped
- Support of gym<0.9.7 is dropped
- Support of loading chainerrl<=0.2.0's replay buffer is dropped
All updates
Enhancement
- A2C (#149, thanks @iory!)
- Add wrappers to cast observations (#160)
- Fix on flake8 3.5.0 (#214)
- Use ()-shaped array for scalar loss (#219)
- FIFO prioritized replay buffer (#277)
- Update Policy class to inherit ABCMeta (#280, thanks @uidilr!)
- Batch PPO Implementation (#295, thanks @ljvmiranda921!)
- Mimic the details of prioritized experience replay (#301)
- Add ScaleReward wrapper (#304)
- Remove GaussianPolicy and obsolete policies (#305)
- Make random access queue sampling code cleaner (#309)
- Support gym==0.10.8 (#324)
- Batch A2C/PPO/DQN (#326)
- Use RandomizeAction wrapper instead of Explorer in evaluation (#328)
- remove duplicate lines (typo) (#329, thanks @monado3!)
- Merge consecutive with statements (#333)
- Use Variable.array instead of Variable.data (#336)
- Remove code for Chainer v2 (#337)
- Implement getitem for ActionValue (#339)
- Count updates of DQN (#341)
- Move Atari Wrappers (#349)
- Render wrapper (#350)
Documentation
- fixes minor typos (#306)
- fixes typo (#307)
- Typos (#308)
- fixes readme typo (#310)
- Adds partial list of paper implementations with links to the main README (#311)
- Adds another paper to list (#312)
- adds some instructions regarding testing for potential contributors (#315)
- Remove duplication of DQN in docs (#334)
- nit on grammar of a comment: (#354)
Examples
- Tuned DoubleDQN with prioritized experience replay (#302)
- adds some descriptions to parseargs arguments (#319)
- Make clip_eps positive (#340)
- updates env in ddpg example (#345)
- Examples (#348)
Testing
- Fix Travis CI errors (#318)
- Parse Chainer version with packaging.version (#322)
- removes tests for old replay buffer (#347)
Bugfixes
- Fix the error caused by inexact delta_z (#314)
- Stop caching the result of numpy.concatenate in LazyFrames (#332)