| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2026-03-07 | 6.7 kB | |
| v0.4.0 source code.tar.gz | 2026-03-07 | 2.8 MB | |
| v0.4.0 source code.zip | 2026-03-07 | 3.0 MB | |
| Totals: 3 Items | 5.8 MB | 0 | |
What's Changed
- add example of qwen3-asr with forced alignment by @eschmidbauer in https://github.com/Blaizzy/mlx-audio/pull/463
- Restore Qwen3-TTS encoder_config to preserve accents in voice clones. by @orbitalquark in https://github.com/Blaizzy/mlx-audio/pull/461
- Ensure TTS audio player plays a trailing, partially-filled audio frame. by @orbitalquark in https://github.com/Blaizzy/mlx-audio/pull/465
- Fix source separation issue with shape mismatch: noise shape for separate_long by @mnoukhov in https://github.com/Blaizzy/mlx-audio/pull/467
- Enable streaming for Qwen3-TTS when ICL mode is enabled. by @orbitalquark in https://github.com/Blaizzy/mlx-audio/pull/466
- feat(stt): add support for MedASR (Lasr architecture) by @sigjhl in https://github.com/Blaizzy/mlx-audio/pull/376
- Formatting fix and add to pyproject.toml by @mnoukhov in https://github.com/Blaizzy/mlx-audio/pull/475
- Use shared model cache resolution for SAM-Audio by @mnoukhov in https://github.com/Blaizzy/mlx-audio/pull/468
- Fix voice matching for Pocket TTS by @lucasnewman in https://github.com/Blaizzy/mlx-audio/pull/477
- Fix ALMs max tokens and chunking by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/474
- [Soprano] Fix decoder and config loading (v1 and v1.1) by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/480
- Add audio separation UI & Server by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/347
- Add Parakeet v3 multilingual support with language detection by @andimarafioti in https://github.com/Blaizzy/mlx-audio/pull/481
- Revert "Do not discard the last unfilled audio frame. (#465)" by @orbitalquark in https://github.com/Blaizzy/mlx-audio/pull/473
- Fix longform generation for Pocket TTS by @lucasnewman in https://github.com/Blaizzy/mlx-audio/pull/486
- Enable streaming /v1/audio/speech server endpoint with raw/pcm data. by @orbitalquark in https://github.com/Blaizzy/mlx-audio/pull/484
- feat: Add support for Voxtral Mini 4B Realtime by @shreyaskarnik in https://github.com/Blaizzy/mlx-audio/pull/487
- fix(kokoro): Chinese TTS crashes with ValueError in g2p pipeline by @smartchainark in https://github.com/Blaizzy/mlx-audio/pull/489
- fix: update STT transcription parameters and preserve original audio format by @shreyaskarnik in https://github.com/Blaizzy/mlx-audio/pull/488
- fix(docs): update outdated model links by @joaopalmeiro in https://github.com/Blaizzy/mlx-audio/pull/495
- [VAD / Diarization] Add sortformer by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/493
- feat: add streaming support and toggle for realtime STT by @Cold-A-Muse in https://github.com/Blaizzy/mlx-audio/pull/494
- chore: update protobuf dependency to version 6.33.5 by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/497
- fix(ui): update footer to display the current year dynamically and link to the repo by @shreyaskarnik in https://github.com/Blaizzy/mlx-audio/pull/509
- Add Smart Turn v3 semantic VAD by @lucasnewman in https://github.com/Blaizzy/mlx-audio/pull/511
- fix(vibevoice-asr): add audio resampling and normalization to preprocessing by @bellkjtt in https://github.com/Blaizzy/mlx-audio/pull/510
- Refactor(whisper): update model instantiation and loading process by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/514
- fix: replace 4 bare excepts with except Exception by @haosenwang1018 in https://github.com/Blaizzy/mlx-audio/pull/521
- feat(stt): add system_prompt parameter to Qwen3ASR generation methods by @chris-schra in https://github.com/Blaizzy/mlx-audio/pull/522
- fix(medasr): collapse CTC tokens manually to prevent raw output by @sigjhl in https://github.com/Blaizzy/mlx-audio/pull/519
- Add Echo TTS by @lucasnewman in https://github.com/Blaizzy/mlx-audio/pull/525
- Allow printing transcriptions to stdout when output path is "-". by @orbitalquark in https://github.com/Blaizzy/mlx-audio/pull/527
- stt: Keep uploaded file extension to avoid unnecessary conversions. by @orbitalquark in https://github.com/Blaizzy/mlx-audio/pull/528
- feat(lid): Add spoken language identification (MMS-LID) by @beshkenadze in https://github.com/Blaizzy/mlx-audio/pull/529
- Set max tokens to a more reasonable value by default for STT by @lucasnewman in https://github.com/Blaizzy/mlx-audio/pull/533
- [Qwen3-TTS] Improve inference, TTFB and add batch support by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/534
- refactor(codec): extract shared ECAPA-TDNN backbone by @beshkenadze in https://github.com/Blaizzy/mlx-audio/pull/532
- Add KittenTTS support and ONNX parity fixes by @Reza2kn in https://github.com/Blaizzy/mlx-audio/pull/517
- Fix Qwen3-TTS streaming decoder throttling with incremental decoding by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/537
- Add ming omni tts (MoE and Dense) by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/515
- [Qwen3-ASR] Fix auto lang detection by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/547
- Add nvfp4, mxfp4 and mxfp8 quants by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/543
- Fix duplicate audio_samples field in GenerationResult dataclass by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/548
- [Whisper] Fix lang code assignment by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/549
New Contributors
- @eschmidbauer made their first contribution in https://github.com/Blaizzy/mlx-audio/pull/463
- @orbitalquark made their first contribution in https://github.com/Blaizzy/mlx-audio/pull/461
- @mnoukhov made their first contribution in https://github.com/Blaizzy/mlx-audio/pull/467
- @sigjhl made their first contribution in https://github.com/Blaizzy/mlx-audio/pull/376
- @andimarafioti made their first contribution in https://github.com/Blaizzy/mlx-audio/pull/481
- @shreyaskarnik made their first contribution in https://github.com/Blaizzy/mlx-audio/pull/487
- @smartchainark made their first contribution in https://github.com/Blaizzy/mlx-audio/pull/489
- @joaopalmeiro made their first contribution in https://github.com/Blaizzy/mlx-audio/pull/495
- @Cold-A-Muse made their first contribution in https://github.com/Blaizzy/mlx-audio/pull/494
- @bellkjtt made their first contribution in https://github.com/Blaizzy/mlx-audio/pull/510
- @haosenwang1018 made their first contribution in https://github.com/Blaizzy/mlx-audio/pull/521
- @chris-schra made their first contribution in https://github.com/Blaizzy/mlx-audio/pull/522
- @Reza2kn made their first contribution in https://github.com/Blaizzy/mlx-audio/pull/517
Full Changelog: https://github.com/Blaizzy/mlx-audio/compare/v0.3.1...v0.4.0