Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2024-09-19 | 3.1 kB | |
v1.0.1 source code.tar.gz | 2024-09-19 | 22.3 MB | |
v1.0.1 source code.zip | 2024-09-19 | 23.3 MB | |
Totals: 3 Items | 45.7 MB | 1 |
This is a minor update which includes some new features and recipes, internal improvements, bugfixes, compatibility improvements, and wider Python backwards compatibility.
NOTE: both v1.0.0 and v1.0.1 were released earlier than this date on GitHub. These releases were accidentally marked as drafts.
Notable changes
- We now advertise support from Python
3.8
to3.12
(instead of3.9
-3.11
) and improved our testing in that regard. - Major improvements to Whisper integration with various tasks supported, fine-tuning fixes, performance improvements, and more (#2450)
- Improved model parameter info printing (#2470)
- Added new metrics mostly targeted at speech recognition (#2451)
- Added backwards compatibility for old
speechbrain.pretrained
imports that were broken following a v1.0 refactor (#2485) - Updated BibTeX citation. You may find the latest one here at all times.
Recipes and other features
- Added a VoxPopuli transducer recipe (#2421)
- Upgraded CommonVoice transducer and transformer recipes (#2433, [#2465], [#2560]) with various improvements
- Refactored ESC50 recipes and added FocalNets (#2499), added single-sample inference for interpretation (#2616)
- Speechtokenizer integration (#2497)
- Added support for HiFi-GAN to work with new SSL Discrete Tokens, with support for bitrate-scalable training on LJSpeech and LibriTTS (#2571)
- Added recipe for Listenable Maps for Audio Classifiers (#2538)
Fixes
- Fixed errors with
ctc_segmentation
(#2505) - Fixes and refactors for
RelPosEncXL
(#2498) - Fixed input normalization incorrectly applying in-place to the user inputs in some cases (#2504)
- DDP fixes (#2506, [#2633])
- Fixed backwards compatibility with older torchaudio version when not using streaming stuff (#2532)
- Fixed Separation and Enhancement recipes behavior when NaN is encountered (#2524)
- Fix for too aggressive SpecAugment in LibriSpeech transducer recipe (#2548)
- Fix for double file conversion in CommonVoice data preparation (#2557)
- Fixed SpectrogramDrop errors in some cases (#2564)
- Potential fix for Windows install issues when using the
--editable
flag (#2541) - Improvements to SSL Discrete Tokens and refactoring (#2509)
- Fixes and improvements for quaternion networks (#2464)
- Fixed AISHELL models and added backwards compatibility warning for
causal
inTransformerASR
(#2606) - ... and a few more
Internal changes
- Improved code quality with the inclusion of spellchecking, include sorting, and stronger documentation linting in the CI pipeline.
- Significantly improved CI performance for a better PR development experience.
- Refactored the module structure of SpeechBrain to use lazy-loading when possible, reducing import times and greatly reducing circular import headaches.
- Introduced some infrastructure to enable some preprocessing when importing state dicts](https://gist.github.com/asumagic/1289f391acf849ca01b8ca7f9c5dd069)