| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2020-04-21 | 1.1 kB | |
| TextBrewer 0.1.9 source code.tar.gz | 2020-04-21 | 8.3 MB | |
| TextBrewer 0.1.9 source code.zip | 2020-04-21 | 8.4 MB | |
| Totals: 3 Items | 16.7 MB | 0 | |
New Features
- Added an option
is_caching_logitstoDistillationConfig. Ifis_caching_logitsis True, the distiller will cache the batches and the output logits of the teacher model, so that those logits will only be calcuated once. It will speed up the distillation process. This feature is only available forBasicDistillerandMultiTeacherDistiller. Be caution of setting it to True on large datasets, since it will store the batches and logits into the memory.
Improvements
- Added new argument
max_grad_normto distillers'trainmethod. It sets the strength of gradient clipping. Default -1, i.e., no gradient clipping. - Added new arguments
scheduler_classandscheduler_argsto distillers'trainmethod. The oldschedulermay cause convergence problem and is deprecated in favor ofscheduler_classandscheduler_args. See the documentation for details. - Removed
printin thedisplay_paramters. Now it won't print the statistics directly to the screen.
Bug Fixes
- Fixed wrong call of zero_grad().