Build Vision Agents quickly with any model or video provider
Python library and CLI tool to interface with Google Translate
TTS with kokoro and onnx runtime
Synchronized Translation for Videos
Qwen3-TTS is an open-source series of TTS models
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Tokenizer-Free TTS for Multilingual Speech Generation
Use Microsoft Edge's online text-to-speech service from Python
A nearly-live implementation of OpenAI's Whisper
Code for openai.fm, a demo for the OpenAI Speech API
Offline inference engine for art, real-time voice conversations
Towards Human-Sounding Speech
A TTS that fits in your CPU (and pocket)
Multi-lingual large voice generation model, providing inference
High-Quality Voice Cloning TTS for 600+ Languages
A high-quality rapid TTS voice cloning model
Workflow and speech recognition app
End-to-end speech processing toolkit
A single Gradio + React WebUI with extensions for ACE-Step
Industrial-level controllable zero-shot text-to-speech system
Speech-AI-Forge is a project developed around TTS generation model
Long-form streaming TTS system for multi-speaker dialogue generation
An Open Source text-to-speech system built by inverting Whisper
Open source text-to-speech tool, supports extra-long text
A lightweight text-to-speech model with zero-shot voice cloning