Motion-controllable Video Generation via Latent Trajectory Guidance
End-to-end pipeline converting generative videos
State-of-the-art (SoTA) text-to-video pre-trained model
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Video understanding codebase from FAIR for reproducing video models
A GUI tool for extracting hard-coded subtitle (hardsub) from videos
Taming Stable Diffusion for Lip Sync
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
3D reconstruction software
NVR with realtime local object detection for IP cameras
PyTorch code and models for VJEPA2 self-supervised learning from video
The most powerful and modular diffusion model GUI, api and backend
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Overcoming Data Limitations for High-Quality Video Diffusion Models
Open-source MCP server that gives your coding agent
Uncommon Objects in 3D dataset
The data structure for multimodal data
Build AI-powered semantic search applications
AI Suite for upscaling, interpolating & restoring images/videos
The Triton Inference Server provides an optimized cloud
Build cross-modal and multimodal applications on the cloud
An extremely simple tool for separating vocals and background music
AI-powered tool to quickly remove watermarks from videos flawlessly