Official repository for LTX-Video
Real time face swap and one-click video deepfake
LTX-Video Support for ComfyUI
AI-powered video clipping and highlight generation
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Video understanding codebase from FAIR for reproducing video models
Large Multimodal Models for Video Understanding and Editing
Build Vision Agents quickly with any model or video provider
Lightweight Python library for adding real-time multi-object tracking
Dealing with all unstructured data, such as reverse image search
An unsupervised and free tool for image and video dataset analysis
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
NVR with realtime local object detection for IP cameras
The Triton Inference Server provides an optimized cloud
Private chat with local GPT with document, images, video, etc.
Sharp Monocular Metric Depth in Less Than a Second
OCR expert VLM powered by Hunyuan's native multimodal architecture
Use Microsoft Edge's online text-to-speech service from Python
Document Image Parsing via Heterogeneous Anchor Prompting”
Build AI-powered semantic search applications
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Build cross-modal and multimodal applications on the cloud
Open Source Computer Vision Library
AI-powered tool to quickly remove watermarks from videos flawlessly
Suite with Real-ESRGAN, BSRGAN , RealESRNet, IRCNN, GFPGAN & RIFE.