The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
ESPnet version 202509 source code.tar.gz	2025-09-13	22.1 MB	0
ESPnet version 202509 source code.zip	2025-09-13	27.5 MB	0
README.md	2025-09-13	5.7 kB	0
Totals: 3 Items		49.6 MB	0

Summary

The 202509 release strengthens ESPnet’s foundation on modern OS and Python environments, enhances training flexibility, and completes the LID ecosystem. With a broad suite of new recipes, we continue to support emerging speech‑related benchmarks and models while ensuring CI stability and developer productivity.

Overview

The 202509 release brings a major shift in our infrastructure and tooling. Key highlights include:

Area	Change
Python / Dependencies	Dropped Python 3.7/3.8, upgraded to Python 3.9–3.13; `numpy` bumped to ≥ 2.2.0; removed Chainer‑related build steps.
OS Support	Ended Debian 11 support, switched CI to Debian 12 containers (ensuring GCC 11+ compatibility).
Warp‑Transducer	Adopted `ljn7/warp-transducer` (FastEmit, modern CUDA/CMake).
LID	Completed the LID subsystem (model, loss, pooling, balanced sampler, tri‑stage scheduler, inference tools).
Training	Added `HybridOptim`/`HybridLRS` for multi‑optimizer and scheduler configurations.
Recipes	New recipes for LongLibriHeavy, Qwen2‑Audio‑7B‑Chat, OWSM v4, Galaxy AVSR, and a LID template.
CI / DevOps	Updated Docker publishing, added automerge workflow, fixed GPU flag handling, and guard against conflicting speaker options.

The release is backed by 10 core contributors, each driving critical modules or infrastructure changes.

Important Pull Requests

#	Category	Title	Key Impact
6228	Deprecation	EOL of Debian 11 support in favor of Debian 12	CI now runs on Debian 12; badges and docs updated; eliminates GCC‑10 GLIBCXX limitation.
6226	Bug‑Fix	Fix GPU flag handling in tts.sh script	Prevents Versa from erroneously enabling GPU when `gpu_inference` is False.
6221	Dependency	Update numpy version	Upgrades `numpy` ≥ 2.2.0, raises Python 3.9 minimum, removes Chainer artifacts.
6220	Refactor	Remove old speechlm module	Cleaned out obsolete SpeechLM code.
6187	Core	Switch warp‑transducer to ljn7 fork	Adds FastEmit support, broader CUDA/CMake compatibility, improved build scripts.
6159	Core	LID‑5: Tri‑stage learning rate scheduler	Stabilizes training with warm‑up / hold / decay phases.
6158	Core	LID‑4: Category‑ and dataset‑aware balanced sampler	Addresses language and dataset imbalance via power‑law sampling.
6156	Feature	LID‑2: Model, loss and pooling modules	Introduces language‑identification models, custom losses, and pooling strategies.
6208	Bug‑Fix	Guard against having both use_sid and use_spk_embed set to true	Prevents conflicting speaker‑ID/embedding settings.
6206	DevOps	Add environment tag to publish docker image	Clarifies Docker publishing workflow.
6202	DevOps	Add Automerge Action	Enables automated PR merging with label and review checks.
6205	Recipe	LongLibriHeavy benchmark	Provides long‑form speech evaluation baseline.
6194	Recipe	Add recipe for Qwen2‑Audio‑7B‑Chat	Baseline for Dynamic‑SUPERB ASR task.
6176	Recipe	OWSM v4 Recipe	Adds OWSM v4 training/configuration.
6160	Recipe	LID‑6: LID recipe template	Offers ready‑to‑use LID experiment scaffold.
6173	Feature	[espnet3‑4] Add support for multiple optimizers and schedulers	Unified handling of multiple optimizers/schedulers (HybridOptim/HybridLRS).
6172	Feature	Additional integration for multi‑optimizer support (see linked PRs)	Supports advanced training strategies.
6132	Recipe	AVSR recipe for Galaxy Dataset	Adds AVSR training capability for Galaxy.

Full changelog

What's Changed

### New Features - LID-2: Model, loss and pooling modules (See [#6156], by @Qingzheng-Wang) ### Enhancement - [espnet3-4] Add support for multiple optimizers and schedulers (See [#6173], by @Masao-Someki) - LID-4: Category- and dataset-aware balanced sampler (See [#6158], by @Qingzheng-Wang) ### Recipe - LongLibriHeavy benchmark (Basic Recipe without training for now) (See [#6205], by @Miamoto) - Add recipe for Qwen2-Audio-7B-Chat on Dynamic-SUPERB ASR task (See [#6194], by @cyhuang-tw) - OWSM v4 Recipe (See [#6176], by @pyf98) - LID-6: LID recipe template (See [#6160], by @Qingzheng-Wang) - AVSR recipe for Galaxy Dataset (See [#6132], by @YJCX330) ### Documentation - EOL of debian 11 support in favor of debian 12 (See [#6228], by @Fhrozen) ### Refactoring - Remove old speechlm module (See [#6220], by @jctian98) ### Others - Fix GPU flag handling in tts.sh script (See [#6226], by @ZhuoyanTao) - Update numpy version (See [#6221], by @Fhrozen) - Add guard against having both use_sid and use_spk_embed set to true (See [#6208], by @ZhuoyanTao) - Add environment tag to publish docker image (See [#6206], by @Fhrozen) - Add Automerge Action (See [#6202], by @Fhrozen) - Switch warp-transducer to ljn7 fork with FastEmit and modern CUDA/CMa… (See [#6187], by @ljn7) - LID-5: Tri-stage learning rate scheduler (See [#6159], by @Qingzheng-Wang) - LID-3: Inference, embedding extraction and t-SNE visualization (See [#6157], by @Qingzheng-Wang)

Acknowledgements

@Fhrozen, @Masao-Someki, @Miamoto, @Qingzheng-Wang, @YJCX330, @ZhuoyanTao, @cyhuang-tw, @jctian98, @ljn7, @pyf98.

Happy coding! 🚀

Source: README.md, updated 2025-09-13

ESPnet Files

End-to-end speech processing toolkit

Summary

Overview

Important Pull Requests

Full changelog

What's Changed

Acknowledgements

ESPnet Files

End-to-end speech processing toolkit

Get an email when there's a new version of ESPnet

Summary

Overview

Important Pull Requests

Full changelog

What's Changed

Acknowledgements