Build Vision Agents quickly with any model or video provider
Phi-3.5 for Mac: Locally-run Vision and Language Models
Open Source Differentiable Computer Vision Library
"Big Model" trains a visual multimodal VLM with 26M parameters
Multilingual Document Layout Parsing in a Single Vision-Language Model
Automatically find issues in image datasets
Low-latency AI inference engine optimized for mobile devices
A Pragmatic VLA Foundation Model
Deep learning library
Turn WiFi signals into real-time human sensing and spatial awareness.
Capable of understanding text, audio, vision, video
Cosmos-RL is a flexible and scalable Reinforcement Learning framework
NeurIPS2025 Spotlight] Quantized Attention
CoreNet: A library for training deep neural networks
High-performance Inference and Deployment Toolkit for LLMs and VLMs
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Official inference repo for FLUX.2 models
Mobile manipulation research tools for roboticists
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Harmonized and Coherent Human Image Animation
MMEditing is a low-level vision toolbox based on PyTorch
Database system for building simpler and faster AI-powered application
Official codebase for I-JEPA
Typeface from Ming Dynasty woodblock printed books
Pre-trained and Reproduced Deep Learning Models