Hunyuan-A13B-Instruct is a powerful instruction-tuned large language model developed by Tencent using a fine-grained Mixture-of-Experts (MoE) architecture. While the total model includes 80 billion parameters, only 13 billion are active per forward pass, making it highly efficient while maintaining strong performance across benchmarks. It supports up to 256K context tokens, advanced reasoning (CoT) abilities, and agent-based workflows with tool parsing. The model offers both fast and slow thinking modes, letting users trade off speed for deeper reasoning. It excels in mathematics, science, coding, and multi-turn conversation tasks, rivaling or outperforming larger models in several areas. Deployment is supported via TensorRT-LLM, vLLM, and SGLang, with Docker images and integration guides provided. Open-source under a custom license, it's ideal for researchers and developers seeking scalable, high-context AI capabilities with optimized inference.
Features
- Mixture-of-Experts model with 13B active/80B total parameters
- Supports up to 256K context tokens for long documents
- Dual reasoning modes: fast (no CoT) and slow (step-by-step logic)
- Tool calling and agent workflows with custom parsing support
- Optimized for mathematics, reasoning, and science benchmarks
- Compatible with TensorRT-LLM, vLLM, and SGLang deployment stacks
- Fine-tuned for instruction-following and multi-turn chat
- Dockerized for quick deployment with official scripts and images