Video understanding codebase from FAIR for reproducing video models
Motion-controllable Video Generation via Latent Trajectory Guidance
End-to-end pipeline converting generative videos
State-of-the-art (SoTA) text-to-video pre-trained model
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Wan2.2: Open and Advanced Large-Scale Video Generative Model
Taming Stable Diffusion for Lip Sync
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
NVR with realtime local object detection for IP cameras
The most powerful and modular diffusion model GUI, api and backend
PyTorch code and models for VJEPA2 self-supervised learning from video
Suite with Real-ESRGAN, BSRGAN , RealESRNet, IRCNN, GFPGAN & RIFE.
Overcoming Data Limitations for High-Quality Video Diffusion Models
CLIP + FFT/DWT/RGB = text to image/video
CoTracker is a model for tracking any point (pixel) on a video
A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator
PaddlePaddle GAN library, including lots of interesting applications
The official pytorch implementation of our paper
A real-time approach for mapping all human pixels of 2D RGB images
We estimate dense, flicker-free, geometrically consistent depth
Efficient 3D human pose estimation in video using 2D keypoint
World's simplest facial recognition api for Python & the command line