Video-based AI memory library. Store millions of text chunks in MP4
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
PyTorch code and models for VJEPA2 self-supervised learning from video
AV1 Image File Format Specification - ISO-BMFF/HEIF derivative
Taming Stable Diffusion for Lip Sync
PyTorch code and models for V-JEPA self-supervised learning from video
Qwen2.5-VL is the multimodal large language model series
OCR expert VLM powered by Hunyuan's native multimodal architecture
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Simple video encoder
Video Processing and Encoding Tools