A Family of Open Sourced Music Foundation Models
Award-Winning Open Source Video Editing Software
Qwen3-omni is a natively end-to-end, omni-modal LLM
Open-source infrastructure for Computer-Use Agents. Sandboxes
Automated Music Discovery and Collection Manager
The official Python client for the Huggingface Hub
Situational Awareness Server compatible with TAK clients
Automagically synchronize subtitles with video
Open-source multi-speaker long-form text-to-speech model
Taming Stable Diffusion for Lip Sync
Extract audio and video content and organize it into a Markdown note
A lightweight audio-to-MIDI converter with pitch bend detection
A speech-text foundation model for real time dialogue
SOTA discrete acoustic codec models with 40/75 tokens per second
Generate audiobooks from e-books, voice cloning & 1107+ languages
A Python library for audio data augmentation
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Synchronized Translation for Videos
Dumb downloader that scrapes the web
Application for managing recipes, planning meals, building shopping
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Transforming Multimodal Content into Captivating Multilingual Audio
Generate audiobooks from EPUBs, PDFs and text with captions
A nearly-live implementation of OpenAI's Whisper
Clone a voice in 5 seconds to generate arbitrary speech in real-time