The SpeechBrain Toolkit - Browse /v1.0.1 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2024-09-19	3.1 kB	0
v1.0.1 source code.tar.gz	2024-09-19	22.3 MB	0
v1.0.1 source code.zip	2024-09-19	23.3 MB	0
Totals: 3 Items		45.7 MB	0

This is a minor update which includes some new features and recipes, internal improvements, bugfixes, compatibility improvements, and wider Python backwards compatibility.

NOTE: both v1.0.0 and v1.0.1 were released earlier than this date on GitHub. These releases were accidentally marked as drafts.

Notable changes

We now advertise support from Python 3.8 to 3.12 (instead of 3.9-3.11) and improved our testing in that regard.
Major improvements to Whisper integration with various tasks supported, fine-tuning fixes, performance improvements, and more (#2450)
Improved model parameter info printing (#2470)
Added new metrics mostly targeted at speech recognition (#2451)
Added backwards compatibility for old speechbrain.pretrained imports that were broken following a v1.0 refactor (#2485)
Updated BibTeX citation. You may find the latest one here at all times.

Recipes and other features

Added a VoxPopuli transducer recipe (#2421)
Upgraded CommonVoice transducer and transformer recipes (#2433, [#2465], [#2560]) with various improvements
Refactored ESC50 recipes and added FocalNets (#2499), added single-sample inference for interpretation (#2616)
Speechtokenizer integration (#2497)
Added support for HiFi-GAN to work with new SSL Discrete Tokens, with support for bitrate-scalable training on LJSpeech and LibriTTS (#2571)
Added recipe for Listenable Maps for Audio Classifiers (#2538)

Fixes

Fixed errors with ctc_segmentation (#2505)
Fixes and refactors for RelPosEncXL (#2498)
Fixed input normalization incorrectly applying in-place to the user inputs in some cases (#2504)
DDP fixes (#2506, [#2633])
Fixed backwards compatibility with older torchaudio version when not using streaming stuff (#2532)
Fixed Separation and Enhancement recipes behavior when NaN is encountered (#2524)
Fix for too aggressive SpecAugment in LibriSpeech transducer recipe (#2548)
Fix for double file conversion in CommonVoice data preparation (#2557)
Fixed SpectrogramDrop errors in some cases (#2564)
Potential fix for Windows install issues when using the --editable flag (#2541)
Improvements to SSL Discrete Tokens and refactoring (#2509)
Fixes and improvements for quaternion networks (#2464)
Fixed AISHELL models and added backwards compatibility warning for causal in TransformerASR (#2606)
... and a few more

Internal changes

Improved code quality with the inclusion of spellchecking, include sorting, and stronger documentation linting in the CI pipeline.
Significantly improved CI performance for a better PR development experience.
Refactored the module structure of SpeechBrain to use lazy-loading when possible, reducing import times and greatly reducing circular import headaches.
Introduced some infrastructure to enable some preprocessing when importing state dicts](https://gist.github.com/asumagic/1289f391acf849ca01b8ca7f9c5dd069)

Source: README.md, updated 2024-09-19

The SpeechBrain Toolkit Files

A PyTorch-based Speech Toolkit

Notable changes

Recipes and other features

Fixes

Internal changes

The SpeechBrain Toolkit Files

A PyTorch-based Speech Toolkit

Get an email when there's a new version of The SpeechBrain Toolkit

Notable changes

Recipes and other features

Fixes

Internal changes