Visual Causal Flow
LTX-Video Support for ComfyUI
Tiny vision language model
Unified Multimodal Understanding and Generation Models
Code for running inference and finetuning with SAM 3 model
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image
Official Python inference and LoRA trainer package
Video Object and Interaction Deletion
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Recovering the Visual Space from Any Views
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Multimodal Diffusion with Representation Alignment
This repository contains the official implementation of FastVLM
Qwen3.5 is the large language model series developed by Qwen team
Python inference and LoRA trainer package for the LTX-2 audio–video
Reference PyTorch implementation and models for DINOv3
Foundational Models for State-of-the-Art Speech and Text Translation
Foundation model for image generation
Contexts Optical Compression
Phi-3.5 for Mac: Locally-run Vision and Language Models
Towards Real-World Vision-Language Understanding
General-purpose image editing model that delivers high-fidelity
Inference script for Oasis 500M