Official repository for LTX-Video
State-of-the-art (SoTA) text-to-video pre-trained model
LTX-Video Support for ComfyUI
RGBD video generation model conditioned on camera input
VMZ: Model Zoo for Video Modeling
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Lets make video diffusion practical
Python inference and LoRA trainer package for the LTX-2 audio–video
GPT4V-level open-source multi-modal model based on Llama3-8B
Video understanding codebase from FAIR for reproducing video models
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
OCR expert VLM powered by Hunyuan's native multimodal architecture
Large Multimodal Models for Video Understanding and Editing
Capable of understanding text, audio, vision, video
A Customizable Image-to-Video Model based on HunyuanVideo
Repo for SeedVR2 & SeedVR
Multimodal-Driven Architecture for Customized Video Generation
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Multimodal Diffusion with Representation Alignment
The Clay Foundation Model - An open source AI model and interface
Agentic, Reasoning, and Coding (ARC) foundation models
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Inference framework for 1-bit LLMs
VGGSfM: Visual Geometry Grounded Deep Structure From Motion