Multilingual Automatic Speech Recognition with word-level timestamps and confidence. Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This repository proposes an implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models. Besides, a confidence score is assigned to each word and each segment.
Features
- The start/end estimation is more accurate
- Documentation available
- Confidence scores are assigned to each word
- If possible (without beam search...), no additional inference steps are required to predict word timestamps (word alignment is done on the fly after each speech segment is decoded)
- Special care has been taken regarding memory usage
- Light installation for CPU
- Plot of word alignment
License
Affero GNU Public LicenseFollow whisper-timestamped
Other Useful Business Software
AI-powered service management for IT and enterprise teams
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of whisper-timestamped!