Visual Causal Flow
LTX-Video Support for ComfyUI
Moonshot's most powerful AI model
Tiny vision language model
Code for running inference and finetuning with SAM 3 model
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Unified Multimodal Understanding and Generation Models
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
A state-of-the-art open visual language model
Official Python inference and LoRA trainer package
Video Object and Interaction Deletion
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Generating Immersive, Explorable, and Interactive 3D Worlds
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Multimodal Diffusion with Representation Alignment
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
This repository contains the official implementation of FastVLM
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Recovering the Visual Space from Any Views
Reference PyTorch implementation and models for DINOv3
Qwen3.5 is the large language model series developed by Qwen team
Python inference and LoRA trainer package for the LTX-2 audio–video
Lets make video diffusion practical
Foundational Models for State-of-the-Art Speech and Text Translation