DeepSeek-V4-Flash is a preview Mixture-of-Experts language model built for efficient million-token context intelligence. It has 284B total parameters with 13B activated and supports a 1M-token context window, making it suitable for long-document reasoning, complex coding, agentic workflows, and large-scale information processing. The model uses a hybrid attention architecture that combines Compressed Sparse Attention and Heavily Compressed Attention to improve long-context efficiency, while Manifold-Constrained Hyper-Connections strengthen signal stability across layers. It is trained on more than 32T tokens and refined through a post-training pipeline that includes supervised fine-tuning, reinforcement learning, domain-specific expert cultivation, and on-policy distillation. DeepSeek-V4-Flash supports non-think, think, and think-max reasoning modes, allowing users to balance speed and depth. It is smaller than DeepSeek-V4-Pro but can approach Pro-level reasoning.
Features
- 1M-token context window for ultra-long tasks
- 284B total parameters with 13B activated
- Mixture-of-Experts architecture for efficient inference
- Hybrid attention using CSA and HCA mechanisms
- Three reasoning modes: non-think, think, and think-max
- Post-trained with SFT, RL, GRPO, and on-policy distillation
- Strong coding, reasoning, and agentic benchmark performance
- MIT-licensed weights for open model deployment