| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| ESPnet version 202604 source code.tar.gz | 2026-04-07 | 20.3 MB | |
| ESPnet version 202604 source code.zip | 2026-04-07 | 24.7 MB | |
| README.md | 2026-04-07 | 8.0 kB | |
| Totals: 3 Items | 45.0 MB | 3 | |
Summary
Overview
This release focuses on significant improvements to the Continuous Integration (CI) infrastructure, performance optimizations for TTS models, and the introduction of new recipe support for various languages and tasks. Key highlights include a major overhaul of the CI pipeline using Docker containers, a 90% performance boost in FastSpeech2 inference, and the addition of new ASR and TTS recipes for Kinyarwanda, Emilia, and other datasets.
Important PRs
๐ Major CI Infrastructure Overhaul
- PR [#6379] & [#6372]: Significant refactoring of the CI pipeline.
- Introduced a new Docker-based build and test workflow to improve consistency, speed, and reproducibility.
- Shifted from environment setup in individual jobs to using pre-built Docker images.
- Modularized CI workflows for Ubuntu and macOS, splitting large jobs for better caching and parallelism.
- Added a reusable composite GitHub Action for environment setup.
- Improved Docker image publishing with a matrix strategy for CPU and GPU variants.
- PR [#6394] & [#6371]: Enhanced robustness of installer scripts.
- Added backup download URLs for FFmpeg installation and
mwerSegmenterto handle primary source failures. - Updated macOS CI workflow to include FFmpeg installation via Homebrew and verification steps.
- Added backup download URLs for FFmpeg installation and
- PR [#6321]: Updated PyTorch support to version 2.9.1, including improved installation script logic for CUDA compatibility.
๐ Performance Boost in FastSpeech2
- PR [#6376]: Implemented shape-bucketing for XPU inference and
torch.compilesupport.- Added
ESPNET_BUCKET_INFERenvironment flag to round tensor sizes to fixed bucket boundaries, enabling efficient use of oneDNN/GEMM primitives. - Defered encoder output trimming to just before the length regulator.
- Fixed batch inference
olenscomputation and added compiler directives to prevent recompilation. - Result: TTS inference time reduced from 166ms to 89ms for Batch=8 (90% performance improvement).
- Added
๐ New Recipe Additions
- PR [#6337]: Added a new TTS recipe for Kinyarwanda using Tacotron 2 with character-based tokenization.
- PR [#6325]: Added an ASR recipe for the Tal-zh-adult-teach dataset (Mandarin Chinese educational data).
- PR [#6295]: Added an ASR recipe for the kosp2e dataset (Korean Speech Perception and Production Experiment).
- PR [#6291]: Added a TTS recipe for the Emilia dataset using the VITS model.
- PR [#6366]: Added a recipe for MS-SNSD (Speech Enhancement) as part of the ESPnet bootcamp.
๐ Bug Fixes and Stability
- PR [#6391]: Fixed inference artifact output support and added named multi-optimizer training support in espnet3.
- PR [#6356]: Fixed Whisper tokenizer compatibility with Transformers v5 by switching to
extra_special_tokens. - PR [#6309]: Added epsilon to standard deviation in normalization to prevent division by zero.
- PR [#6302]: Fixed
CategoryChunkIterFactoryto use actual sample lengths instead of padded lengths, reducing silent chunks. - PR [#6306]: Fixed visinger2 inference to support phoneme duration inference.
- PR [#6293]: Fixed data processing steps in the POWSM recipe.
๐ Documentation and Refactoring
- PR [#6335] & [#6386]: Updated documentation to reflect ESPnet1 EOL and standardized argument group creation across task modules.
- PR [#6327]: Reorganized the
espnet3directory structure intocomponents/,systems/,parallel/, andutils/. - PR [#6328] & [#6329]: Added ASR system and inference packages, along with logging utilities for espnet3.
- PR [#6354], [#6353], [#6352]: Updated SpeechLM module with improvements to trainer, processor, model, data loading, and binary files.
Contributors
A total of 20 contributors participated in this release, including:
- Fhrozen, Masao-Someki, jthakurH, whr-a, jctian98, chinjouli, Dahee96, osinkolu, HsunGong, South-Twilight, zheedong, NewGamezzz, LiChenda, HANJionghao, elnaske, sw005320, thecaptain789, popcornell, dependabot[bot], and pre-commit-ci[bot].
Full changelogn
What's Changed
Acknowledgements
@Dahee96, @Fhrozen, @HANJionghao, @HsunGong, @LiChenda, @Masao-Someki, @NewGamezzz, @South-Twilight, @chinjouli, @dependabot[bot], @elnaske, @jctian98, @jthakurH, @osinkolu, @popcornell, @pre-commit-ci[bot], @sw005320, @thecaptain789, @whr-a, @zheedong.