| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2026-03-02 | 4.3 kB | |
| v0.16.0 source code.tar.gz | 2026-03-02 | 2.1 MB | |
| v0.16.0 source code.zip | 2026-03-02 | 2.3 MB | |
| Totals: 3 Items | 4.4 MB | 0 | |
Highlights
This release introduces TTSKit - a brand-new optional library that brings high-quality text-to-speech capabilities on-device using the latest CoreML features such as MLState and MLTensors for optimal inference on the Apple Neural Engine.
With this first release, we're launching Qwen3-TTS CustomVoice models 0.6b and 1.7b with instruction control, with more to come in future releases (including voice cloning).
Download, load, generate, and stream playback in 3 lines of code:
:::swift
import TTSKit
let ttsKit = try await TTSKit()
try await ttsKit.play(text: "Hello from TTSKit!")
Key Features
- Real-time adaptive streaming
- Plays audio while it's still generating for the fastest time from text input to first audio buffer output
.automode adapts based on the inference speed of the device for consistent, smooth playback.
- 9 built-in voices
- 10 languages
- Style instruction support (1.7B model only)
- Automatic chunking for long form inputs
- Audio file exports in wav/m4a format with optional metadata.
- Modular protocol-based architecture (6 swappable Core ML components) for easy customization and future model adoption.
See the new TTSKit section in the README.md for full API docs, model selection, and advanced usage.
CLI
Try it out with the following command:
:::bash
swift run -c release whisperkit-cli tts --text "Hello from TTSKit" --play
Also available via Homebrew upon release:
brew install whisperkit-cli
whisperkit-cli tts --text "Hello from TTSKit" --play
Gives full control over speaker, language, model variant, style, temperature, chunking strategy, compute units, seed for reproducibility, and more.
Example App
Along with the CLI, we're also releasing a new example app for developers to reference when building TTSKit into their apps. It features real-time waveform visualization, model management, persistent audio file history with metadata, and multi-platform support. Here's a screenshot:
More info about running this app in the example's README.md
Architecture Changes
- New shared
ArgmaxCoretarget for common utilities -
TTSKit ships as an optional product in the same Swift package (no breaking changes to existing WhisperKit code).
:::swift .target( name: "YourApp", dependencies: [ "WhisperKit", // speech-to-text "TTSKit", // text-to-speech ] ),
-
The repo will be renamed to reflect the new multi-kit architecture in an upcoming release.
Thank you to @naykutguven and @shura-v for the excellent improvements packaged with this release prior to TTSKit listed below 🚀
What's Changed
- Update doc for
prewarmby @chen-argmax in https://github.com/argmaxinc/WhisperKit/pull/387 - Pin Xcode version as 26 for Github workflows by @naykutguven in https://github.com/argmaxinc/WhisperKit/pull/386
- AudioProcessor: fix teardown to avoid StartIO/thread warnings on some Bluetooth devices by @shura-v in https://github.com/argmaxinc/WhisperKit/pull/402
- Mute-style input suppression without pausing AVAudioEngine by @shura-v in https://github.com/argmaxinc/WhisperKit/pull/401
- Add TTSKit with Qwen3-TTS support by @ZachNagengast in https://github.com/argmaxinc/WhisperKit/pull/425
New Contributors
- @shura-v made their first contribution in https://github.com/argmaxinc/WhisperKit/pull/402
Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.15.0...v0.16.0