This repo contains the code for 1D tokenizer and generator
Code for running inference and finetuning with SAM 3 model
LTX-Video Support for ComfyUI
Unified Multimodal Understanding and Generation Models
Witness the aha moment of VLM with less than $3
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
A state-of-the-art open visual language model
Visual Instruction Tuning: Large Language-and-Vision Assistant
Chat & pretrained large vision language model
Tiny vision language model
Parse files for optimal RAG
This repository contains the official implementation of FastVLM
A framework to enable multimodal models to operate a computer
A neural network that transforms a design mock-up into static websites
Generating Immersive, Explorable, and Interactive 3D Worlds
VMZ: Model Zoo for Video Modeling
Taming Stable Diffusion for Lip Sync
Towards Real-World Vision-Language Understanding
CogView4, CogView3-Plus and CogView3(ECCV 2024)
SAPIEN Manipulation Skill Framework
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Refer and Ground Anything Anywhere at Any Granularity
Self-supervised visual learning using momentum contrast in PyTorch
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning