Python inference and LoRA trainer package for the LTX-2 audio–video
Fast and Universal 3D reconstruction model for versatile tasks
A multimodal model for brain response prediction
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Qwen2.5-VL is the multimodal large language model series
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning