Visual Causal Flow
Open image model at the forefront of design
LTX-Video Support for ComfyUI
Tiny vision language model
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Code for running inference and finetuning with SAM 3 model
Official Python inference and LoRA trainer package
Recovering the Visual Space from Any Views
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Lets make video diffusion practical
Unified Multimodal Understanding and Generation Models
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Official implementation of Watermark Anything with Localized Messages
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Video Object and Interaction Deletion
Generating Immersive, Explorable, and Interactive 3D Worlds
This repository contains the official implementation of FastVLM
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
A state-of-the-art open visual language model
Multimodal Diffusion with Representation Alignment
Contexts Optical Compression
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Reference PyTorch implementation and models for DINOv3
Python inference and LoRA trainer package for the LTX-2 audio–video
VGGSfM: Visual Geometry Grounded Deep Structure From Motion