| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| ESPnet version 202511 source code.tar.gz | 2025-11-17 | 22.2 MB | |
| ESPnet version 202511 source code.zip | 2025-11-17 | 27.6 MB | |
| README.md | 2025-11-17 | 10.2 kB | |
| Totals: 3 Items | 49.8 MB | 3 | |
Summary
Release date: 2025‑11‑17
Milestone: 202511 – A version that brings a large number of new features, stability improvements, and a refreshed CI & Docker workflow.
ESPnet 202511 introduces robust parallel processing primitives, a fully‑refactored inference & evaluation pipeline, extensive SpeechLM support, and a modernized Docker/CI stack. The release also resolves lingering bugs in codec EMA logic, MPS device handling, and category‑balanced batching while tightening dependency management and documentation quality.
Highlighted Pull Requests
| # | Title | Category | Key Impact |
|---|---|---|---|
| 6300 | Bump js‑yaml from 4.1.0 to 4.1.1 in /doc/vuepress |
Dep‑Update | Secures the documentation build against a prototype‑pollution CVE in yaml merge |
| 6284 | codec fix: DDP logic and dead code revival logic | Bugfix | Restores EMA state for dead‑code recovery and synchronizes codec updates across all DDP workers |
| 6286 | [SpeechLM] Deepspeed trainer | New Feature | Adds full DeepSpeed support (train.py + deepspeed_trainer.py) for large‑scale SpeechLM training |
| 6279 | [SpeechLM] model, preprocessor and collect_stats | New Feature | Core SpeechLM components – job templates, preprocessing, multimodal IO, and stats collection |
| 6278 | [SpeechLM] Deepspeed trainer | New Feature | See above – DeepSpeed integration for SpeechLM workflows |
| 6276 | Docker Updates | Refactor | Upgrades Ubuntu 24.04, CUDA 12.6, PyTorch 2.8.0, and transitions to Miniforge; modernizes Dockerfile syntax |
| 6275 | CI Installation fix | Bugfix | Adds --no-build-isolation for editable installs, improving reproducibility across CI environments |
| 6273 | [ESPnet‑Codec] Bug fix on codec activation function | Bugfix | Enables BF16 inference by registering torch.ones for auto‑cast |
| 6272 | Add Pytorch version 2.9 | Dep‑Update | Extends supported PyTorch releases (2.5.1, 2.7.1, 2.8.0, 2.9.0) in CI and docs |
| 6263 | [ESPnet‑3] Merge master into espnet3 branch | Merge | Syncs espnet3 with master, fixing CI and dependency mismatches |
| 6260 | SpeechLM Data Infra: dataset management | New Feature | Implements data registry, dataset loaders, and configuration templates for SpeechLM |
| 6259 | pre‑commit.ci autoupdate | Tooling | Updates black and isort to latest stable versions |
| 6255 | Fix default batch sampler fallback for category iterator | Bugfix | Restores legacy folded → catbel mapping, improving backward compatibility |
| 6253 | Restrict Docker Github Actions to Original Repo | Security | Prevents accidental image publishing from forks or non‑master branches |
| 6249 | [espnet3‑7] Add Callbacks | New Feature | Adds AverageCheckpointsCallback and standard callback factory for Lightning trainers |
| 6248 | Get forced alignments from CTC model | Feature | Enables forced alignment extraction for any CTC‑based S2T model |
| 6246 | MPS Support for loading float64 models | Bugfix | Handles float‑64 to float‑32 conversion for MPS device, avoiding dtype errors |
| 6244 | LID‑7: VoxLingua107 recipe | Recipe | Adds a new spoken‑language‑identification recipe for VoxLingua107 |
| 6243 | [espnet‑3] Merge master into espnet3 and fixed CI | Merge | Syncs espnet3 with master, removing underthesea dependency |
| 6239 | Upgrade pyopenjtalk to 0.4.1 | Dep‑Update | Updates pyopenjtalk installer to the latest version |
| 6238 | Add Pytorch version 2.9 | Dep‑Update | See 6272 |
| 6238 | Package Build Patch | Build | Moves g2p_en & ctc‑segmentation installation to Makefile, fixing pip package build |
| 6238 | Docker Updates | Refactor | See 6276 |
| 6238 | CI Installation fix | Bugfix | See 6275 |
| 6238 | [ESPnet‑Codec] Bug fix on codec activation function | Bugfix | See 6273 |
| 6238 | Add Pytorch version 2.9 | Dep‑Update | See 6272 |
| 6227 | Terry/parallelize spk emb extraction | Feature | Parallel speaker‑embedding extraction for TTS recipes |
| 6210 | LID‑8: CI and unit tests | Test | Adds comprehensive unit tests for LID functionality |
| 6178 | [espnet3‑6] Add evaluation scripts | Feature | Modularizes inference & evaluation pipelines in espnet3 |
| 6179 | [espnet3] ESPnet1 Support Sunset | Refactor | Removes legacy ESPnet1 support, consolidates to espnet2.legacy |
| 6177 | Merge master into espnet3 | Merge | Syncs espnet3 with master, fixing CI issues |
| 6175 | [espnet3‑5] Add parallel module and collect_stats | Feature | Adds Dask‑based parallel processing and collect_stats for data stats collection |
| 6174 | LID‑7: VoxLingua107 recipe | Recipe | See 6244 |
| 6173 | LID‑8: CI and unit tests | Test | See 6210 |
| 6172 | [espnet3‑5] Add parallel module and collect_stats | Feature | See 6175 |
| 6171 | [espnet3‑5] Add parallel module and collect_stats | Feature | See 6175 |
| 6170 | LID‑8: CI and unit tests | Test | See 6210 |
| 6168 | [espnet3‑5] Add parallel module and collect_stats | Feature | See 6175 |
| 6165 | LID‑8: CI and unit tests | Test | See 6210 |
| 6164 | LID‑8: CI and unit tests | Test | See 6210 |
| 6163 | LID‑8: CI and unit tests | Test | See 6210 |
| 6162 | LID‑8: CI and unit tests | Test | See 6210 |
| 6161 | LID‑8: CI and unit tests | Test | See 6210 |
| 6160 | LID‑8: CI and unit tests | Test | See 6210 |
| 6159 | LID‑7: VoxLingua107 recipe | Recipe | See 6244 |
| 6158 | LID‑7: VoxLingua107 recipe | Recipe | See 6244 |
| 6157 | LID‑7: VoxLingua107 recipe | Recipe | See 6244 |
| 6156 | LID‑7: VoxLingua107 recipe | Recipe | See 6244 |
| 6155 | LID‑7: VoxLingua107 recipe | Recipe | See 6244 |
| 6154 | LID‑7: VoxLingua107 recipe | Recipe | See 6244 |
Note: The table above summarizes the most impactful PRs for this release. Several PRs are grouped by shared functionality (e.g., SpeechLM, Docker, and LID). Contributors for these changes include
dependabot[bot],whr-a,chinjouli,jctian98,Fhrozen,Masao‑Someki,KanTakahiro,akreal,pre‑commit‑ci[bot],Qingzheng‑Wang,Shikhar‑S,SanderGi,sw005320, andZhuoyanTao.
Key Takeaways
- Parallelism & Scalability – Dask‑based
espnet3.parallel,collect_stats, and new callbacks enable efficient distributed training, inference, and checkpoint ensembling. - SpeechLM Maturity – Core modules, DeepSpeed integration, multimodal IO, and data infrastructure create a solid foundation for large‑scale speech‑language models.
- Stability & Security – Updated dependencies (js‑yaml, PyTorch, CUDA), Docker 12.6, and Miniforge; bugfixes for codec EMA, MPS device handling, and category sampling.
- CI & Packaging – Modernized GitHub Actions, improved pip install flags, and new Docker images for Ubuntu 24.
What's Changed (Full changelog)
Acknowledgements
@Fhrozen, @KanTakahiro, @Masao-Someki, @Qingzheng-Wang, @SanderGi, @Shikhar-S, @ZhuoyanTao, @akreal, @chinjouli, @dependabot[bot], @jctian98, @sw005320, @whr-a.