Running a big model on a small laptop
DeepSeek 4 Flash local inference engine for Metal
Unified KV Cache Compression Methods for Auto-Regressive Models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
A JavaScript HTML screenshot renderer
Vision-language-action model for robot control via images and text