Label Studio is a multi-type data labeling and annotation tool
Python inference and LoRA trainer package for the LTX-2 audio–video
GenAI Processors is a lightweight Python library
A youtube-dl fork with additional features and fixes
Qwen3-TTS is an open-source series of TTS models
Official PyTorch Implementation
Build AI-powered semantic search applications
Build Vision Agents quickly with any model or video provider
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Converts text to speech in realtime
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
Towards Human-Level Text-to-Speech through Style Diffusion
Document Image Parsing via Heterogeneous Anchor Prompting”
pyglet is a cross-platform windowing and multimedia library for Python
High-resolution models for human tasks
Easy-to-use Speech Toolkit including Self-Supervised Learning model
The official Python Library for the Groq API
An Open Source text-to-speech system built by inverting Whisper
The data structure for multimodal data
A2M is a desktop app that converts AUDIO TO MIDI in one click.
StreamSpeech is a seamless model for offline speech recognition
Official repository for LTX-Video
HunyuanVideo: A Systematic Framework For Large Video Generation Model
A fast TTS architecture with conditional flow matching
State-of-the-art diffusion models for image and audio generation