A Unified Framework for Text-to-3D and Image-to-3D Generation
A Powerful Native Multimodal Model for Image Generation
Contexts Optical Compression
Inference framework for 1-bit LLMs
Recovering the Visual Space from Any Views
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Reference PyTorch implementation and models for DINOv3
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Official implementation of DreamCraft3D
A SOTA open-source image editing model
Official repository for LTX-Video
A Customizable Image-to-Video Model based on HunyuanVideo
Multimodal model achieving SOTA performance
Multimodal-Driven Architecture for Customized Video Generation
Diffusion Transformer with Fine-Grained Chinese Understanding
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Official implementation of Watermark Anything with Localized Messages
Personalize Any Characters with a Scalable Diffusion Transformer
LTX-Video Support for ComfyUI
Generating Immersive, Explorable, and Interactive 3D Worlds
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
A Systematic Framework for Interactive World Modeling
RGBD video generation model conditioned on camera input
Sharp Monocular Metric Depth in Less Than a Second
Unified Multimodal Understanding and Generation Models