Visual Causal Flow
Moonshot's most powerful AI model
A state-of-the-art open visual language model
LTX-Video Support for ComfyUI
Code for running inference and finetuning with SAM 3 model
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Official Python inference and LoRA trainer package
Tiny vision language model
Recovering the Visual Space from Any Views
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Unified Multimodal Understanding and Generation Models
This repository contains the official implementation of FastVLM
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Wan2.1: Open and Advanced Large-Scale Video Generative Model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Lets make video diffusion practical
Python inference and LoRA trainer package for the LTX-2 audio–video
Video Object and Interaction Deletion
Generating Immersive, Explorable, and Interactive 3D Worlds
Multimodal Diffusion with Representation Alignment
Qwen3.5 is the large language model series developed by Qwen team
Reference PyTorch implementation and models for DINOv3
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning