Wan2.1: Open and Advanced Large-Scale Video Generative Model
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Text and image to video generation: CogVideoX and CogVideo
A Customizable Image-to-Video Model based on HunyuanVideo
Multimodal-Driven Architecture for Customized Video Generation
Official Python inference and LoRA trainer package
RGBD video generation model conditioned on camera input
Capable of understanding text, audio, vision, video
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
GPT4V-level open-source multi-modal model based on Llama3-8B
Lets make video diffusion practical
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Recovering the Visual Space from Any Views
Generating Immersive, Explorable, and Interactive 3D Worlds
Sharp Monocular Metric Depth in Less Than a Second
Qwen3-omni is a natively end-to-end, omni-modal LLM
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Advancing Open-source World Models
Code for running inference and finetuning with SAM 3 model
Multimodal embedding and reranking models built on Qwen3-VL
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Project Lyra: Open Generative 3D World Models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
AI Suite for upscaling, interpolating & restoring images/videos