Python Computer Vision & Video Analytics Framework With Batteries Incl
Vision-language-action model for robot control via images and text
CLIP ViT-bigG/14: Zero-shot image-text model trained on LAION-2B
Small 3B-base multimodal model ideal for custom AI on edge hardware
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
Metric monocular depth estimation (vision model)