MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Language modeling in a sentence representation space
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Advancing Formal Mathematical Reasoning via Reinforcement Learning
Clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashMLA: Efficient Multi-head Latent Attention Kernels
Renderer for the harmony response format to be used with gpt-oss
Implementation of the Surya Foundation Model for Heliophysics
A SOTA open-source image editing model
Safety reasoning models built-upon gpt-oss
Diversity-driven optimization and large-model reasoning ability
A state-of-the-art open visual language model
Chinese and English multimodal conversational language model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Multi-modal large language model designed for audio understanding
Open-source framework for intelligent speech interaction
Large Multimodal Models for Video Understanding and Editing
MiniMax-M2, a model built for Max coding & agentic workflows
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
LLM-based Reinforcement Learning audio edit model
Open-weight, large-scale hybrid-attention reasoning model
Large-language-model & vision-language-model based on Linear Attention
Capable of understanding text, audio, vision, video