A high-quality rapid TTS voice cloning model
Offline Text To Speech synthesis for python
A robust, efficient, low-latency speech-to-text library
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Industrial-level controllable zero-shot text-to-speech system
kaldi-asr/kaldi is the official location of the Kaldi project
Audio foundation model excelling in audio understanding
Faster Whisper transcription with CTranslate2
Multilingual Automatic Speech Recognition with word-level timestamps
Open-source multi-speaker long-form text-to-speech model
High-Quality Voice Cloning TTS for 600+ Languages
A TTS that fits in your CPU (and pocket)
Foundational model for human-like, expressive TTS
Qwen3-omni is a natively end-to-end, omni-modal LLM
An Open Source text-to-speech system built by inverting Whisper
Towards Human-Sounding Speech
Fast multimodal LLM for real-time voice interaction and AI apps
State-of-the-art TTS model under 25MB
The behavior guidance framework for customer-facing LLM agents
Toolkit for conversational AI
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
Python library and CLI tool to interface with Google Translate
Open-source industrial-grade ASR models
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Synchronized Translation for Videos