Official implementation of Watermark Anything with Localized Messages
Code for running inference with the SAM 3D Body Model 3DB
A Customizable Image-to-Video Model based on HunyuanVideo
Official MiniMax Model Context Protocol (MCP) server
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Lets make video diffusion practical
High-Resolution Image Synthesis with Latent Diffusion Models
Capable of understanding text, audio, vision, video
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Train machine learning models within Docker containers
An unsupervised and free tool for image and video dataset analysis
Implementation of Imagen, Google's Text-to-Image Neural Network
ImageBind One Embedding Space to Bind Them All
Usable Implementation of "Bootstrap Your Own Latent" self-supervised
Code for running inference and finetuning with SAM 3 model
Contexts Optical Compression
AI Toolkit for Healthcare Imaging
GPT4V-level open-source multi-modal model based on Llama3-8B
YOLOv5 is the world's most loved vision AI
Offline inference engine for art, real-time voice conversations
21 Lessons, Get Started Building with Generative AI
A Unified Framework for Image Customization
Chinese and English multimodal conversational language model
Tensor search for humans
Simplifies the local serving of AI models from any source