MiniMax-01 is the official repository for two flagship models: MiniMax-Text-01, a long-context language model, and MiniMax-VL-01, a vision-language model built on top of it. MiniMax-Text-01 uses a hybrid attention architecture that blends Lightning Attention, standard softmax attention, and Mixture-of-Experts (MoE) routing to achieve both high throughput and long-context reasoning. It has 456 billion total parameters with 45.9 billion activated per token and is trained with advanced parallel strategies such as LASP+, varlen ring attention, and Expert Tensor Parallelism, enabling a training context of 1 million tokens and up to 4 million tokens at inference. MiniMax-VL-01 extends this core by adding a 303M-parameter Vision Transformer and a two-layer MLP projector in a ViT–MLP–LLM framework, allowing the model to process images at dynamic resolutions up to 2016×2016.
Features
- MiniMax-Text-01 language model with 456B total parameters and 45.9B active per token for high-capacity reasoning
- Hybrid Lightning Attention, softmax attention, and MoE design for efficient long-context processing
- Training context length up to 1M tokens and inference support up to 4M tokens for ultra-long documents
- MiniMax-VL-01 vision-language model using a 303M-parameter ViT and MLP projector on top of MiniMax-Text-01
- Dynamic image-resolution mechanism, patch-based encoding, and fused thumbnail representation for rich visual understanding
- Detailed benchmark results, technical report, model cards, and inference scripts for text-only and multimodal use cases