An Open Source text-to-speech system built by inverting Whisper
Data Infrastructure providing an approach to multimodal AI workloads
Official repository for LTX-Video
NetEase cloud music command line version
Video editing with Python
Industrial-level controllable zero-shot text-to-speech system
Document Image Parsing via Heterogeneous Anchor Prompting”
HunyuanVideo: A Systematic Framework For Large Video Generation Model
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Converts text to speech in realtime
Python library and CLI tool to interface with Google Translate
Python inference and LoRA trainer package for the LTX-2 audio–video
AI-powered tool for generating, optimizing, and translating subtitles
State-of-the-art TTS model under 25MB
A simple native web interface that uses ChatTTS to synthesize text
Music Assistant is a free, opensource Media library manager
A 0.1B Omni model trained from scratch
Qwen3-ASR is an open-source series of ASR models
High-resolution models for human tasks
Easy-to-use Speech Toolkit including Self-Supervised Learning model
A fast TTS architecture with conditional flow matching
pyglet is a cross-platform windowing and multimedia library for Python
Qwen3-TTS is an open-source series of TTS models
Build multimodal language agents for fast prototype and production
A lightweight text-to-speech model with zero-shot voice cloning