Multimodal-Driven Architecture for Customized Video Generation
ComfyUI wrapper nodes for HunyuanVideo
Generating Immersive, Explorable, and Interactive 3D Worlds
Cut videos with a text editor
Handwritten Text Recognition (HTR) system implemented with TensorFlow
State-of-the-art (SoTA) text-to-video pre-trained model
A Family of Open Sourced Music Foundation Models
Generate audiobooks from e-books
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
A fast TTS architecture with conditional flow matching
Video-based AI memory library. Store millions of text chunks in MP4
A high-quality rapid TTS voice cloning model
Faster Whisper transcription with CTranslate2
State-of-the-art TTS model under 25MB
Translate the video from one language to another and embed dubbing
Framework for building realtime multimodal voice AI agents apps
Chat with it via text and voice
Open source healthcare AI
Industrial-level controllable zero-shot text-to-speech system
OCR model for complex documents with layout-aware structured outputs
Stable Diffusion WebUI optimized for AMD GPUs with editing tools
Python module for parsing semi-structured text into python tables
High accuracy RAG for answering questions from scientific documents
Qwen3-omni is a natively end-to-end, omni-modal LLM
Unifying 3D Mesh Generation with Language Models