whisper-large-v3
High-accuracy multilingual speech recognition and translation model
... input and better support for Cantonese, achieving up to 20% error reduction over Whisper-large-v2. It handles zero-shot transcription and translation, performs language detection automatically, and supports features like word-level timestamps and long-form audio processing. The model integrates well with Hugging Face Transformers and supports optimizations such as batching, SDPA, and Flash Attention 2.