RGBD video generation model conditioned on camera input
A Customizable Image-to-Video Model based on HunyuanVideo
High-Resolution Image Synthesis with Latent Diffusion Models
Capable of understanding text, audio, vision, video
Official MiniMax Model Context Protocol (MCP) server
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Train machine learning models within Docker containers
Implementation of Imagen, Google's Text-to-Image Neural Network
Fast image augmentation library and an easy-to-use wrapper
Implementation of a U-net complete with efficient attention
ImageBind One Embedding Space to Bind Them All
Official implementation of Watermark Anything with Localized Messages
Code for running inference and finetuning with SAM 3 model
Contexts Optical Compression
AI Toolkit for Healthcare Imaging
GPT4V-level open-source multi-modal model based on Llama3-8B
An unsupervised and free tool for image and video dataset analysis
Offline inference engine for art, real-time voice conversations
YOLOv5 is the world's most loved vision AI
A Unified Framework for Image Customization
Chinese and English multimodal conversational language model
Tensor search for humans
Simplifies the local serving of AI models from any source
Unified Multimodal Understanding and Generation Models
Sharp Monocular Metric Depth in Less Than a Second