Code for running inference and finetuning with SAM 3 model
High-Quality Voice Cloning TTS for 600+ Languages
A generative speech model for daily dialogue
A lightweight text-to-speech model with zero-shot voice cloning
A simple native web interface that uses ChatTTS to synthesize text
Video-based AI memory library. Store millions of text chunks in MP4
Official MiniMax Model Context Protocol (MCP) server
Robust Speech Recognition via Large-Scale Weak Supervision
A high-quality rapid TTS voice cloning model
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML
Speech recognition module for Python
Offline inference engine for art, real-time voice conversations
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
State-of-the-art TTS model under 25MB
OCR software, free and offline
Official inference repo for FLUX.2 models
Tokenizer-Free TTS for Multilingual Speech Generation
Qwen3-TTS is an open-source series of TTS models
SOTA Open Source TTS
Python library and CLI tool to interface with Google Translate
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Wan2.1: Open and Advanced Large-Scale Video Generative Model
A text-to-speech, speech-to-text and speech-to-speech library
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX