MLX-Audio - Browse /v0.4.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2026-03-07	6.7 kB	0
v0.4.0 source code.tar.gz	2026-03-07	2.8 MB	0
v0.4.0 source code.zip	2026-03-07	3.0 MB	0
Totals: 3 Items		5.8 MB	0

What's Changed

add example of qwen3-asr with forced alignment by @eschmidbauer in https://github.com/Blaizzy/mlx-audio/pull/463
Restore Qwen3-TTS encoder_config to preserve accents in voice clones. by @orbitalquark in https://github.com/Blaizzy/mlx-audio/pull/461
Ensure TTS audio player plays a trailing, partially-filled audio frame. by @orbitalquark in https://github.com/Blaizzy/mlx-audio/pull/465
Fix source separation issue with shape mismatch: noise shape for separate_long by @mnoukhov in https://github.com/Blaizzy/mlx-audio/pull/467
Enable streaming for Qwen3-TTS when ICL mode is enabled. by @orbitalquark in https://github.com/Blaizzy/mlx-audio/pull/466
feat(stt): add support for MedASR (Lasr architecture) by @sigjhl in https://github.com/Blaizzy/mlx-audio/pull/376
Formatting fix and add to pyproject.toml by @mnoukhov in https://github.com/Blaizzy/mlx-audio/pull/475
Use shared model cache resolution for SAM-Audio by @mnoukhov in https://github.com/Blaizzy/mlx-audio/pull/468
Fix voice matching for Pocket TTS by @lucasnewman in https://github.com/Blaizzy/mlx-audio/pull/477
Fix ALMs max tokens and chunking by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/474
[Soprano] Fix decoder and config loading (v1 and v1.1) by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/480
Add audio separation UI & Server by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/347
Add Parakeet v3 multilingual support with language detection by @andimarafioti in https://github.com/Blaizzy/mlx-audio/pull/481
Revert "Do not discard the last unfilled audio frame. (#465)" by @orbitalquark in https://github.com/Blaizzy/mlx-audio/pull/473
Fix longform generation for Pocket TTS by @lucasnewman in https://github.com/Blaizzy/mlx-audio/pull/486
Enable streaming /v1/audio/speech server endpoint with raw/pcm data. by @orbitalquark in https://github.com/Blaizzy/mlx-audio/pull/484
feat: Add support for Voxtral Mini 4B Realtime by @shreyaskarnik in https://github.com/Blaizzy/mlx-audio/pull/487
fix(kokoro): Chinese TTS crashes with ValueError in g2p pipeline by @smartchainark in https://github.com/Blaizzy/mlx-audio/pull/489
fix: update STT transcription parameters and preserve original audio format by @shreyaskarnik in https://github.com/Blaizzy/mlx-audio/pull/488
fix(docs): update outdated model links by @joaopalmeiro in https://github.com/Blaizzy/mlx-audio/pull/495
[VAD / Diarization] Add sortformer by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/493
feat: add streaming support and toggle for realtime STT by @Cold-A-Muse in https://github.com/Blaizzy/mlx-audio/pull/494
chore: update protobuf dependency to version 6.33.5 by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/497
fix(ui): update footer to display the current year dynamically and link to the repo by @shreyaskarnik in https://github.com/Blaizzy/mlx-audio/pull/509
Add Smart Turn v3 semantic VAD by @lucasnewman in https://github.com/Blaizzy/mlx-audio/pull/511
fix(vibevoice-asr): add audio resampling and normalization to preprocessing by @bellkjtt in https://github.com/Blaizzy/mlx-audio/pull/510
Refactor(whisper): update model instantiation and loading process by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/514
fix: replace 4 bare excepts with except Exception by @haosenwang1018 in https://github.com/Blaizzy/mlx-audio/pull/521
feat(stt): add system_prompt parameter to Qwen3ASR generation methods by @chris-schra in https://github.com/Blaizzy/mlx-audio/pull/522
fix(medasr): collapse CTC tokens manually to prevent raw output by @sigjhl in https://github.com/Blaizzy/mlx-audio/pull/519
Add Echo TTS by @lucasnewman in https://github.com/Blaizzy/mlx-audio/pull/525
Allow printing transcriptions to stdout when output path is "-". by @orbitalquark in https://github.com/Blaizzy/mlx-audio/pull/527
stt: Keep uploaded file extension to avoid unnecessary conversions. by @orbitalquark in https://github.com/Blaizzy/mlx-audio/pull/528
feat(lid): Add spoken language identification (MMS-LID) by @beshkenadze in https://github.com/Blaizzy/mlx-audio/pull/529
Set max tokens to a more reasonable value by default for STT by @lucasnewman in https://github.com/Blaizzy/mlx-audio/pull/533
[Qwen3-TTS] Improve inference, TTFB and add batch support by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/534
refactor(codec): extract shared ECAPA-TDNN backbone by @beshkenadze in https://github.com/Blaizzy/mlx-audio/pull/532
Add KittenTTS support and ONNX parity fixes by @Reza2kn in https://github.com/Blaizzy/mlx-audio/pull/517
Fix Qwen3-TTS streaming decoder throttling with incremental decoding by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/537
Add ming omni tts (MoE and Dense) by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/515
[Qwen3-ASR] Fix auto lang detection by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/547
Add nvfp4, mxfp4 and mxfp8 quants by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/543
Fix duplicate audio_samples field in GenerationResult dataclass by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/548
[Whisper] Fix lang code assignment by @Blaizzy in https://github.com/Blaizzy/mlx-audio/pull/549