Label Studio is a multi-type data labeling and annotation tool
A python tool that uses GPT-4, FFmpeg, and OpenCV
A lightweight text-to-speech model with zero-shot voice cloning
The official Python library for the OpenAI API
Controllable & emotion-expressive zero-shot TTS
Generate audiobooks from e-books
Industrial-level controllable zero-shot text-to-speech system
Python library and CLI tool to interface with Google Translate
Python inference and LoRA trainer package for the LTX-2 audio–video
Build Vision Agents quickly with any model or video provider
Official PyTorch Implementation
Qwen3-TTS is an open-source series of TTS models
Offline Text To Speech synthesis for python
Build AI-powered semantic search applications
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Converts text to speech in realtime
Towards Human-Level Text-to-Speech through Style Diffusion
Document Image Parsing via Heterogeneous Anchor Prompting”
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
High-resolution models for human tasks
Easy-to-use Speech Toolkit including Self-Supervised Learning model
HunyuanVideo: A Systematic Framework For Large Video Generation Model
The official Python Library for the Groq API
An Open Source text-to-speech system built by inverting Whisper
The data structure for multimodal data