A Multi-Modal World Model for Reconstructing, Generating, Simulation
Qwen3-omni is a natively end-to-end, omni-modal LLM
A Family of Open Sourced Music Foundation Models
Phi-3.5 for Mac: Locally-run Vision and Language Models
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Moonshot's most powerful AI model
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Multimodal embedding and reranking models built on Qwen3-VL
High-resolution models for human tasks
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Qwen2.5-VL is the multimodal large language model series
Foundational Models for State-of-the-Art Speech and Text Translation
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Omnimodal AI model for agents, coding, and long-context tasks
CLIP model fine-tuned for zero-shot fashion product classification