Download Latest Version v1.0.3 source code.tar.gz (26.7 MB)
Email in envelope

Get an email when there's a new version of The SpeechBrain Toolkit

Home / v1.0.1
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2024-09-19 3.1 kB
v1.0.1 source code.tar.gz 2024-09-19 22.3 MB
v1.0.1 source code.zip 2024-09-19 23.3 MB
Totals: 3 Items   45.7 MB 1

This is a minor update which includes some new features and recipes, internal improvements, bugfixes, compatibility improvements, and wider Python backwards compatibility.

NOTE: both v1.0.0 and v1.0.1 were released earlier than this date on GitHub. These releases were accidentally marked as drafts.

Notable changes

  • We now advertise support from Python 3.8 to 3.12 (instead of 3.9-3.11) and improved our testing in that regard.
  • Major improvements to Whisper integration with various tasks supported, fine-tuning fixes, performance improvements, and more (#2450)
  • Improved model parameter info printing (#2470)
  • Added new metrics mostly targeted at speech recognition (#2451)
  • Added backwards compatibility for old speechbrain.pretrained imports that were broken following a v1.0 refactor (#2485)
  • Updated BibTeX citation. You may find the latest one here at all times.

Recipes and other features

  • Added a VoxPopuli transducer recipe (#2421)
  • Upgraded CommonVoice transducer and transformer recipes (#2433, [#2465], [#2560]) with various improvements
  • Refactored ESC50 recipes and added FocalNets (#2499), added single-sample inference for interpretation (#2616)
  • Speechtokenizer integration (#2497)
  • Added support for HiFi-GAN to work with new SSL Discrete Tokens, with support for bitrate-scalable training on LJSpeech and LibriTTS (#2571)
  • Added recipe for Listenable Maps for Audio Classifiers (#2538)

Fixes

  • Fixed errors with ctc_segmentation (#2505)
  • Fixes and refactors for RelPosEncXL (#2498)
  • Fixed input normalization incorrectly applying in-place to the user inputs in some cases (#2504)
  • DDP fixes (#2506, [#2633])
  • Fixed backwards compatibility with older torchaudio version when not using streaming stuff (#2532)
  • Fixed Separation and Enhancement recipes behavior when NaN is encountered (#2524)
  • Fix for too aggressive SpecAugment in LibriSpeech transducer recipe (#2548)
  • Fix for double file conversion in CommonVoice data preparation (#2557)
  • Fixed SpectrogramDrop errors in some cases (#2564)
  • Potential fix for Windows install issues when using the --editable flag (#2541)
  • Improvements to SSL Discrete Tokens and refactoring (#2509)
  • Fixes and improvements for quaternion networks (#2464)
  • Fixed AISHELL models and added backwards compatibility warning for causal in TransformerASR (#2606)
  • ... and a few more

Internal changes

  • Improved code quality with the inclusion of spellchecking, include sorting, and stronger documentation linting in the CI pipeline.
  • Significantly improved CI performance for a better PR development experience.
  • Refactored the module structure of SpeechBrain to use lazy-loading when possible, reducing import times and greatly reducing circular import headaches.
  • Introduced some infrastructure to enable some preprocessing when importing state dicts](https://gist.github.com/asumagic/1289f391acf849ca01b8ca7f9c5dd069)
Source: README.md, updated 2024-09-19