LTX-Video Support for ComfyUI
Reference PyTorch implementation and models for DINOv3
Unified Multimodal Understanding and Generation Models
Official implementation of Watermark Anything with Localized Messages
Generating Immersive, Explorable, and Interactive 3D Worlds
Python inference and LoRA trainer package for the LTX-2 audio–video
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
CogView4, CogView3-Plus and CogView3(ECCV 2024)
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
ICLR2024 Spotlight: curation/training code, metadata, distribution
Large-language-model & vision-language-model based on Linear Attention
PyTorch implementation of MAE