Download Latest Version ESPnet version 202511 source code.tar.gz (22.2 MB)
Email in envelope

Get an email when there's a new version of ESPnet

Home / v.202509
Name Modified Size InfoDownloads / Week
Parent folder
ESPnet version 202509 source code.tar.gz 2025-09-13 22.1 MB
ESPnet version 202509 source code.zip 2025-09-13 27.5 MB
README.md 2025-09-13 5.7 kB
Totals: 3 Items   49.6 MB 0

Summary

The 202509 release strengthens ESPnet’s foundation on modern OS and Python environments, enhances training flexibility, and completes the LID ecosystem. With a broad suite of new recipes, we continue to support emerging speech‑related benchmarks and models while ensuring CI stability and developer productivity.

Overview

The 202509 release brings a major shift in our infrastructure and tooling. Key highlights include:

Area Change
Python / Dependencies Dropped Python 3.7/3.8, upgraded to Python 3.9–3.13; numpy bumped to ≥ 2.2.0; removed Chainer‑related build steps.
OS Support Ended Debian 11 support, switched CI to Debian 12 containers (ensuring GCC 11+ compatibility).
Warp‑Transducer Adopted ljn7/warp-transducer (FastEmit, modern CUDA/CMake).
LID Completed the LID subsystem (model, loss, pooling, balanced sampler, tri‑stage scheduler, inference tools).
Training Added HybridOptim/HybridLRS for multi‑optimizer and scheduler configurations.
Recipes New recipes for LongLibriHeavy, Qwen2‑Audio‑7B‑Chat, OWSM v4, Galaxy AVSR, and a LID template.
CI / DevOps Updated Docker publishing, added automerge workflow, fixed GPU flag handling, and guard against conflicting speaker options.

The release is backed by 10 core contributors, each driving critical modules or infrastructure changes.


Important Pull Requests

# Category Title Key Impact
6228 Deprecation EOL of Debian 11 support in favor of Debian 12 CI now runs on Debian 12; badges and docs updated; eliminates GCC‑10 GLIBCXX limitation.
6226 Bug‑Fix Fix GPU flag handling in tts.sh script Prevents Versa from erroneously enabling GPU when gpu_inference is False.
6221 Dependency Update numpy version Upgrades numpy ≥ 2.2.0, raises Python 3.9 minimum, removes Chainer artifacts.
6220 Refactor Remove old speechlm module Cleaned out obsolete SpeechLM code.
6187 Core Switch warp‑transducer to ljn7 fork Adds FastEmit support, broader CUDA/CMake compatibility, improved build scripts.
6159 Core LID‑5: Tri‑stage learning rate scheduler Stabilizes training with warm‑up / hold / decay phases.
6158 Core LID‑4: Category‑ and dataset‑aware balanced sampler Addresses language and dataset imbalance via power‑law sampling.
6156 Feature LID‑2: Model, loss and pooling modules Introduces language‑identification models, custom losses, and pooling strategies.
6208 Bug‑Fix Guard against having both use_sid and use_spk_embed set to true Prevents conflicting speaker‑ID/embedding settings.
6206 DevOps Add environment tag to publish docker image Clarifies Docker publishing workflow.
6202 DevOps Add Automerge Action Enables automated PR merging with label and review checks.
6205 Recipe LongLibriHeavy benchmark Provides long‑form speech evaluation baseline.
6194 Recipe Add recipe for Qwen2‑Audio‑7B‑Chat Baseline for Dynamic‑SUPERB ASR task.
6176 Recipe OWSM v4 Recipe Adds OWSM v4 training/configuration.
6160 Recipe LID‑6: LID recipe template Offers ready‑to‑use LID experiment scaffold.
6173 Feature [espnet3‑4] Add support for multiple optimizers and schedulers Unified handling of multiple optimizers/schedulers (HybridOptim/HybridLRS).
6172 Feature Additional integration for multi‑optimizer support (see linked PRs) Supports advanced training strategies.
6132 Recipe AVSR recipe for Galaxy Dataset Adds AVSR training capability for Galaxy.

Full changelog

What's Changed

### New Features - LID-2: Model, loss and pooling modules (See [#6156], by @Qingzheng-Wang) ### Enhancement - [espnet3-4] Add support for multiple optimizers and schedulers (See [#6173], by @Masao-Someki) - LID-4: Category- and dataset-aware balanced sampler (See [#6158], by @Qingzheng-Wang) ### Recipe - LongLibriHeavy benchmark (Basic Recipe without training for now) (See [#6205], by @Miamoto) - Add recipe for Qwen2-Audio-7B-Chat on Dynamic-SUPERB ASR task (See [#6194], by @cyhuang-tw) - OWSM v4 Recipe (See [#6176], by @pyf98) - LID-6: LID recipe template (See [#6160], by @Qingzheng-Wang) - AVSR recipe for Galaxy Dataset (See [#6132], by @YJCX330) ### Documentation - EOL of debian 11 support in favor of debian 12 (See [#6228], by @Fhrozen) ### Refactoring - Remove old speechlm module (See [#6220], by @jctian98) ### Others - Fix GPU flag handling in tts.sh script (See [#6226], by @ZhuoyanTao) - Update numpy version (See [#6221], by @Fhrozen) - Add guard against having both use_sid and use_spk_embed set to true (See [#6208], by @ZhuoyanTao) - Add environment tag to publish docker image (See [#6206], by @Fhrozen) - Add Automerge Action (See [#6202], by @Fhrozen) - Switch warp-transducer to ljn7 fork with FastEmit and modern CUDA/CMa… (See [#6187], by @ljn7) - LID-5: Tri-stage learning rate scheduler (See [#6159], by @Qingzheng-Wang) - LID-3: Inference, embedding extraction and t-SNE visualization (See [#6157], by @Qingzheng-Wang)

Acknowledgements

@Fhrozen, @Masao-Someki, @Miamoto, @Qingzheng-Wang, @YJCX330, @ZhuoyanTao, @cyhuang-tw, @jctian98, @ljn7, @pyf98.

Happy coding! 🚀

Source: README.md, updated 2025-09-13