MiniMax-M1 is the world’s first open-weight, large-scale hybrid-attention reasoning model designed for long-context and complex reasoning tasks. Powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism, it efficiently supports context lengths up to 1 million tokens—eight times larger than many contemporary models. MiniMax-M1 significantly reduces computational overhead at generation time, consuming only about 25% FLOPs compared to comparable models for very long sequences. Trained using large-scale reinforcement learning on diverse tasks, it excels in mathematics, software engineering, agentic tool use, and long-context understanding benchmarks. It outperforms other open-weight models like DeepSeek R1 and Qwen3-235B on complex reasoning and coding challenges. MiniMax-M1 is available in two versions with 40K and 80K token thinking budgets, offering scalable performance based on your application needs.
Features
- Hybrid Mixture-of-Experts architecture combined with lightning attention for efficient long-context processing
- Supports ultra-long context length up to 1 million tokens
- Reduces test-time compute by ~75% compared to similar large models on long inputs
- Trained with large-scale reinforcement learning on mathematical reasoning, coding, and software engineering tasks
- Available in 40K and 80K token context length versions for flexible usage
- Supports function calling with structured output of external function parameters
- Deployment optimized for vLLM with efficient memory and batch processing
- Customizable system prompts tailored for general, web dev, and mathematical scenarios