Robust Speech Recognition via Large-Scale Weak Supervision
Pre-trained Deep Learning models and demos
Speech-AI-Forge is a project developed around TTS generation model
Audio foundation model excelling in audio understanding
Port of OpenAI's Whisper model in C/C++
StreamSpeech is a seamless model for offline speech recognition
Open-source multi-speaker long-form text-to-speech model
Foundational model for human-like, expressive TTS
Multilingual speech recognition and audio understanding model
Faster Whisper transcription with CTranslate2
MARS5 speech model (TTS) from CAMB.AI
The open-source voice synthesis studio powered by Qwen3-TTS
GLM-4-Voice | End-to-End Chinese-English Conversational Model
A lightweight text-to-speech model with zero-shot voice cloning
TTS with kokoro and onnx runtime
State-of-the-art TTS model under 25MB
A high-quality rapid TTS voice cloning model
A generative speech model for daily dialogue
Repo of Qwen2-Audio chat & pretrained large audio language model
Multi-lingual large voice generation model, providing inference
A text-to-speech, speech-to-text and speech-to-speech library
Fast and accurate automatic speech recognition (ASR) for edge devices
Tokenizer-Free TTS for Multilingual Speech Generation
PersonaPlex code
A Conversational Speech Generation Model