Hy3 preview
Efficient MoE model for reasoning, coding, and AI agent workflows
...It is the first model built on Tencent’s rebuilt training infrastructure and introduces significant improvements in context learning, software engineering, and tool-based task execution. The model features 295B total parameters with only 21B activated during inference, plus a dedicated 3.8B Multi-Token Prediction (MTP) layer that accelerates generation through speculative decoding. Architecturally, it uses 192 routed experts with top-8 activation, a dense-MoE hybrid design, and a native 256K-token context window. Hy3-preview is optimized for efficient deployment while maintaining strong benchmark performance across reasoning, coding, and agent evaluations. ...