Marrying Grounding DINO with Segment Anything & Stable Diffusion
Open-Sora: Democratizing Efficient Video Production for All
A single Gradio + React WebUI with extensions for ACE-Step
A Multi-Modal World Model for Reconstructing, Generating, Simulation
End-to-end speech processing toolkit
Interface for OuteTTS models
Multi-lingual large voice generation model, providing inference
Python library for building agents that leverages Google Antigravity
Framework for building, orchestrating, and deploying AI agents
Large-language-model & vision-language-model based on Linear Attention
Quick illustration of how one can easily read books together with LLMs
Fast multimodal LLM for real-time voice interaction and AI apps
Autoregressive Model Beats Diffusion
Diffusion Transformer with Fine-Grained Chinese Understanding
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Build Vision Agents quickly with any model or video provider
MARS5 speech model (TTS) from CAMB.AI
Qwen3-ASR is an open-source series of ASR models
Real-time voice interactive digital human
Bidirectional token-classification model for identifiable info
Context-aware desktop AI assistant that understands screen content
Multi-tool for semantic search
Synchronized Translation for Videos
Build multimodal language agents for fast prototype and production
Generate Any 3D Scene in Seconds