WhisperLive is a “nearly live” implementation of OpenAI’s Whisper model focused on real-time transcription. It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and network streams such as RTSP and HLS, making it flexible for live events, monitoring, or accessibility workflows. Configuration options let you control the number of clients, maximum connection time, and threading behavior so the server can be tuned for different deployment environments. On the client side, you can set the language, whether to translate into English, model size, voice activity detection, and output recording behavior.
Features
- Real-time Whisper transcription server with Python client for low-latency speech-to-text
- Multiple backends supported (Faster-Whisper, TensorRT, OpenVINO) for GPU and CPU acceleration
- Handles microphone, local audio files, RTSP streams, and HLS audio sources
- Optional translation mode to translate input speech into a target language
- Browser extensions for Chrome and Firefox plus an iOS client for direct device integration
- Docker-friendly deployment and tunable concurrency options (max clients, connection time, threads)