| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| ESPnet version 202509 source code.tar.gz | 2025-09-13 | 22.1 MB | |
| ESPnet version 202509 source code.zip | 2025-09-13 | 27.5 MB | |
| README.md | 2025-09-13 | 5.7 kB | |
| Totals: 3 Items | 49.6 MB | 0 | |
Summary
The 202509 release strengthens ESPnet’s foundation on modern OS and Python environments, enhances training flexibility, and completes the LID ecosystem. With a broad suite of new recipes, we continue to support emerging speech‑related benchmarks and models while ensuring CI stability and developer productivity.
Overview
The 202509 release brings a major shift in our infrastructure and tooling. Key highlights include:
| Area | Change |
|---|---|
| Python / Dependencies | Dropped Python 3.7/3.8, upgraded to Python 3.9–3.13; numpy bumped to ≥ 2.2.0; removed Chainer‑related build steps. |
| OS Support | Ended Debian 11 support, switched CI to Debian 12 containers (ensuring GCC 11+ compatibility). |
| Warp‑Transducer | Adopted ljn7/warp-transducer (FastEmit, modern CUDA/CMake). |
| LID | Completed the LID subsystem (model, loss, pooling, balanced sampler, tri‑stage scheduler, inference tools). |
| Training | Added HybridOptim/HybridLRS for multi‑optimizer and scheduler configurations. |
| Recipes | New recipes for LongLibriHeavy, Qwen2‑Audio‑7B‑Chat, OWSM v4, Galaxy AVSR, and a LID template. |
| CI / DevOps | Updated Docker publishing, added automerge workflow, fixed GPU flag handling, and guard against conflicting speaker options. |
The release is backed by 10 core contributors, each driving critical modules or infrastructure changes.
Important Pull Requests
| # | Category | Title | Key Impact |
|---|---|---|---|
| 6228 | Deprecation | EOL of Debian 11 support in favor of Debian 12 | CI now runs on Debian 12; badges and docs updated; eliminates GCC‑10 GLIBCXX limitation. |
| 6226 | Bug‑Fix | Fix GPU flag handling in tts.sh script | Prevents Versa from erroneously enabling GPU when gpu_inference is False. |
| 6221 | Dependency | Update numpy version | Upgrades numpy ≥ 2.2.0, raises Python 3.9 minimum, removes Chainer artifacts. |
| 6220 | Refactor | Remove old speechlm module | Cleaned out obsolete SpeechLM code. |
| 6187 | Core | Switch warp‑transducer to ljn7 fork | Adds FastEmit support, broader CUDA/CMake compatibility, improved build scripts. |
| 6159 | Core | LID‑5: Tri‑stage learning rate scheduler | Stabilizes training with warm‑up / hold / decay phases. |
| 6158 | Core | LID‑4: Category‑ and dataset‑aware balanced sampler | Addresses language and dataset imbalance via power‑law sampling. |
| 6156 | Feature | LID‑2: Model, loss and pooling modules | Introduces language‑identification models, custom losses, and pooling strategies. |
| 6208 | Bug‑Fix | Guard against having both use_sid and use_spk_embed set to true | Prevents conflicting speaker‑ID/embedding settings. |
| 6206 | DevOps | Add environment tag to publish docker image | Clarifies Docker publishing workflow. |
| 6202 | DevOps | Add Automerge Action | Enables automated PR merging with label and review checks. |
| 6205 | Recipe | LongLibriHeavy benchmark | Provides long‑form speech evaluation baseline. |
| 6194 | Recipe | Add recipe for Qwen2‑Audio‑7B‑Chat | Baseline for Dynamic‑SUPERB ASR task. |
| 6176 | Recipe | OWSM v4 Recipe | Adds OWSM v4 training/configuration. |
| 6160 | Recipe | LID‑6: LID recipe template | Offers ready‑to‑use LID experiment scaffold. |
| 6173 | Feature | [espnet3‑4] Add support for multiple optimizers and schedulers | Unified handling of multiple optimizers/schedulers (HybridOptim/HybridLRS). |
| 6172 | Feature | Additional integration for multi‑optimizer support (see linked PRs) | Supports advanced training strategies. |
| 6132 | Recipe | AVSR recipe for Galaxy Dataset | Adds AVSR training capability for Galaxy. |
Full changelog
What's Changed
Acknowledgements
@Fhrozen, @Masao-Someki, @Miamoto, @Qingzheng-Wang, @YJCX330, @ZhuoyanTao, @cyhuang-tw, @jctian98, @ljn7, @pyf98.
Happy coding! 🚀