Official repository for LTX-Video
State-of-the-art (SoTA) text-to-video pre-trained model
VMZ: Model Zoo for Video Modeling
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Wan2.1: Open and Advanced Large-Scale Video Generative Model
RGBD video generation model conditioned on camera input
LTX-Video Support for ComfyUI
Video understanding codebase from FAIR for reproducing video models
Large Multimodal Models for Video Understanding and Editing
Capable of understanding text, audio, vision, video
Repo for SeedVR2 & SeedVR
A Customizable Image-to-Video Model based on HunyuanVideo
Multimodal-Driven Architecture for Customized Video Generation
Lets make video diffusion practical
Python inference and LoRA trainer package for the LTX-2 audio–video
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Multimodal Diffusion with Representation Alignment
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Code for running inference and finetuning with SAM 3 model
GPT4V-level open-source multi-modal model based on Llama3-8B
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Qwen2.5-VL is the multimodal large language model series
Qwen3-omni is a natively end-to-end, omni-modal LLM
OCR expert VLM powered by Hunyuan's native multimodal architecture