Build Vision Agents quickly with any model or video provider
Qwen3-TTS is an open-source series of TTS models
Tokenizer-Free TTS for Multilingual Speech Generation
Generate audiobooks from e-books, voice cloning & 1107+ languages
TTS with kokoro and onnx runtime
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
State-of-the-art TTS model under 25MB
A high-quality rapid TTS voice cloning model
Synchronized Translation for Videos
SOTA Open Source TTS
Long-form streaming TTS system for multi-speaker dialogue generation
High-Quality Voice Cloning TTS for 600+ Languages
Spark-TTS Inference Code
Multi-lingual large voice generation model, providing inference
Towards Human-Sounding Speech
End-to-end speech processing toolkit
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
An Open Source text-to-speech system built by inverting Whisper
Scalable generative AI framework built for researchers and developers
Toolkit for conversational AI
Foundational model for human-like, expressive TTS
Controllable & emotion-expressive zero-shot TTS
Controllable and fast Text-to-Speech for over 7000 languages
A lightweight text-to-speech model with zero-shot voice cloning
Framework for building neural networks