Qwen3-Next-80B-A3B-Instruct is the flagship release in the Qwen3-Next series, designed as a next-generation foundation model for ultra-long context and efficient reasoning. With 80B total parameters and 3B activated at a time, it leverages hybrid attention (Gated DeltaNet + Gated Attention) and a high-sparsity Mixture-of-Experts architecture to achieve exceptional efficiency. The model natively supports a context length of 262K tokens and can be extended up to 1 million tokens using RoPE scaling (YaRN), making it highly capable for processing large documents and extended conversations. Multi-Token Prediction (MTP) boosts both training and inference, while stability optimizations such as weight-decayed and zero-centered layernorm ensure robustness. Benchmarks show it performs comparably to larger models like Qwen3-235B on reasoning, coding, multilingual, and alignment tasks while requiring only a fraction of the training cost.
Features
- 80B parameter instruct model with 3B active parameters for efficiency
- Hybrid attention (Gated DeltaNet + Gated Attention) for ultra-long context handling
- High-sparsity Mixture-of-Experts with 512 experts and 10 activated per layer
- Supports 262K tokens natively and up to 1M tokens with RoPE scaling (YaRN)
- Multi-Token Prediction (MTP) accelerates training and inference speed
- Strong benchmark results rivaling larger models in reasoning, coding, and alignment
- Optimized for instruct use: no reasoning outputs, stable responses
- Open-source under Apache 2.0 with support for vLLM and SGLang deployments