WhisperKit - Browse /v0.17.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2026-03-13	3.0 kB	0
v0.17.0 source code.tar.gz	2026-03-13	3.2 MB	0
v0.17.0 source code.zip	2026-03-13	3.4 MB	0
Totals: 3 Items		6.6 MB	0

Highlights

We're excited to bring our commercial-grade speaker diarization framework, SpeakerKit, to open-source!

With NVIDIA Sortformer now powering real-time speaker diarization in the Argmax Pro SDK, we're open-sourcing our implementation of Pyannote 4 (community-1). Pyannote is well-known for solving the "who spoke when" problem and has shown strong results on datasets such as AMI, DIHARD, and VoxConverse. Read the blog post for architecture details and benchmarks.

Quickstart

Download, load, diarize, and generate RTTM (Rich Transcription Time Marked) output in just a few lines of code:

import SpeakerKit

let speakerKit = try await SpeakerKit()
let result = try await speakerKit.diarize(audioPath: "audio.wav")
let rttm = speakerKit.generateRTTM(result: result)

Key features

End-to-end Pyannote-style diarization pipeline
Automatically estimate the number of speakers or set it manually
Utilities to add speaker information to WhisperKit outputs
Standard RTTM export

Explore the new SpeakerKit README section for API documentation, configuration details, and optimization tips.

CLI

whisperkit-cli now includes a dedicated diarize subcommand:

swift run -c release whisperkit-cli diarize --audio-path audio.wav --rttm-path output.rttm

With Homebrew:

brew install whisperkit-cli
whisperkit-cli diarize --audio-path audio.wav --rttm-path output.rttm

You can also run transcription and diarization together using the new --diarization flag:

whisperkit-cli transcribe --audio-path audio.wav --diarization

Example output:

---- Speaker Diarization Results ----
SPEAKER audio 1 0.220 7.360 What is RLHF? reinforcement learning with human feedback. What was that little magic ingredient to the dish that made it so much more delicious? <NA> A <NA> <NA>
SPEAKER audio 1 7.610 14.850 - So we train these models on a lot of text data. And in that process, they learn something about the underlying representations of what's in here or in there. <NA> B <NA> <NA>

Additional flags are available for speaker counts, model variants, clustering algorithms tuning, and more.

WhisperAX example updates

The WhisperAX example app has been updated with SpeakerKit support. It now includes diarization toggles, a flexible pipeline selector, and a Speakers tab for browsing labeled segments.

What's Changed

Add SpeakerKit with Pyannote speaker diarization support by @a2they in https://github.com/argmaxinc/WhisperKit/pull/440

Full Changelog: https://github.com/argmaxinc/WhisperKit/compare/v0.16.0...v0.17.0

Source: README.md, updated 2026-03-13

WhisperKit Files

On-device Speech Recognition for Apple Silicon

Highlights

Key features

CLI

WhisperAX example updates

What's Changed

WhisperKit Files

On-device Speech Recognition for Apple Silicon

Get an email when there's a new version of WhisperKit

Highlights

Key features

CLI

WhisperAX example updates

What's Changed