Accurate × Fast × Comprehensive
The Multi-Agent Framework
Controllable & emotion-expressive zero-shot TTS
Multilingual sentence & image embeddings with BERT
Python Terminal Toolkit - a Spiced Up TUI Library
"Big Model" trains a visual multimodal VLM with 26M parameters
Implementation of "MobileCLIP" CVPR 2024
A speech-text foundation model for real time dialogue
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Official Python inference and LoRA trainer package
High-Resolution Image Synthesis with Latent Diffusion Models
Multimodal embedding and reranking models built on Qwen3-VL
Transforming Multimodal Content into Captivating Multilingual Audio
A high-quality PDF to Markdown tool based on large language model
PersonaPlex code
Open-Sora: Democratizing Efficient Video Production for All
High-quality multi-lingual text-to-speech library by MyShell.ai
A Systematic Framework for Interactive World Modeling
Gemma open-weight LLM library, from Google DeepMind
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Fast-stable-diffusion + DreamBooth
Virtual AI anchor that combines state-of-the-art technology
Turn your website into a GIF
Qwen2.5-VL is the multimodal large language model series