A Multi-Modal World Model for Reconstructing, Generating, Simulation
Qwen3-omni is a natively end-to-end, omni-modal LLM
A Family of Open Sourced Music Foundation Models
Phi-3.5 for Mac: Locally-run Vision and Language Models
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Multimodal embedding and reranking models built on Qwen3-VL
High-resolution models for human tasks
Qwen2.5-VL is the multimodal large language model series
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
GPT4V-level open-source multi-modal model based on Llama3-8B
Multi-modal large language model designed for audio understanding