MiniCPM4.1 is an enhanced iteration of the MiniCPM4 architecture, introducing improvements in reasoning capabilities, inference speed, and hybrid operation modes that allow dynamic switching between deep reasoning and standard generation. It builds upon the same efficiency-focused philosophy but further optimizes decoding performance, achieving substantial speed gains in reasoning-intensive tasks while maintaining high-quality outputs. One of its key innovations is the hybrid reasoning mode, which allows developers to control whether the model engages in deeper reasoning processes or faster responses depending on the use case. The model also supports both dense and sparse attention mechanisms, enabling more efficient computation depending on the selected inference framework. With improved pretraining on longer sequences and enhanced scaling techniques, MiniCPM4.1 delivers better performance in long-context tasks and complex problem solving.
Features
- Hybrid reasoning mode with controllable deep thinking or fast responses
- Enhanced decoding speed for reasoning-intensive workloads
- Support for both dense and sparse attention inference modes
- Integration with optimized inference engines like SGLang and CPM.cu
- Improved long-context training with extended sequence handling
- Speculative decoding support for accelerated generation