GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Contexts Optical Compression
Unified Multimodal Understanding and Generation Models
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Detect faces in an image
PyTorch implementation of MAE
Lightweight 24B agentic coding model with vision and long context
Small 3B-base multimodal model ideal for custom AI on edge hardware
Compact 3B-param multimodal model for efficient on-device reasoning
Powerful 14B-base multimodal model — flexible base for fine-tuning