Build Vision Agents quickly with any model or video provider
ExDARK dataset is the largest collection of low-light images
Phi-3.5 for Mac: Locally-run Vision and Language Models
Low-Rank and Sparse Tools for Background Modeling and Subtraction
Open Source Differentiable Computer Vision Library
Moonshot's most powerful AI model
"Big Model" trains a visual multimodal VLM with 26M parameters
A Pragmatic VLA Foundation Model
Automatically find issues in image datasets
Multilingual Document Layout Parsing in a Single Vision-Language Model
Turn WiFi signals into real-time human sensing and spatial awareness.
Vision AI browser agent for automation, testing, and extraction
A blazing fast AI Gateway with integrated guardrails
Deep learning library
Optimism is Ethereum, scaled
NeurIPS2025 Spotlight] Quantized Attention
High-performance Inference and Deployment Toolkit for LLMs and VLMs
Cosmos-RL is a flexible and scalable Reinforcement Learning framework
CoreNet: A library for training deep neural networks
Capable of understanding text, audio, vision, video
Official inference repo for FLUX.2 models
Mobile manipulation research tools for roboticists
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Harmonized and Coherent Human Image Animation