Multilingual Automatic Speech Recognition with word-level timestamps and confidence. Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This repository proposes an implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models. Besides, a confidence score is assigned to each word and each segment.
Features
- The start/end estimation is more accurate
- Documentation available
- Confidence scores are assigned to each word
- If possible (without beam search...), no additional inference steps are required to predict word timestamps (word alignment is done on the fly after each speech segment is decoded)
- Special care has been taken regarding memory usage
- Light installation for CPU
- Plot of word alignment
License
Affero GNU Public LicenseFollow whisper-timestamped
Other Useful Business Software
Compliant and Reliable File Transfers Backed by Top Security Certifications
Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of whisper-timestamped!