Build Vision Agents quickly with any model or video provider
Phi-3.5 for Mac: Locally-run Vision and Language Models
Open Source Differentiable Computer Vision Library
"Big Model" trains a visual multimodal VLM with 26M parameters
Low-latency AI inference engine optimized for mobile devices
Multilingual Document Layout Parsing in a Single Vision-Language Model
A Pragmatic VLA Foundation Model
Deep learning library
Cosmos-RL is a flexible and scalable Reinforcement Learning framework
Capable of understanding text, audio, vision, video
NeurIPS2025 Spotlight] Quantized Attention
High-performance Inference and Deployment Toolkit for LLMs and VLMs
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Official inference repo for FLUX.2 models
Easy-to-use Speech Toolkit including Self-Supervised Learning model
MMEditing is a low-level vision toolbox based on PyTorch
Database system for building simpler and faster AI-powered application
Typeface from Ming Dynasty woodblock printed books
Pre-trained and Reproduced Deep Learning Models
Easy-OCR solution and Tesseract trainer for GNU/Linux
Pattern recognition for ADL events
A low code unified framework for computer vision and deep learning