QwQ-32B is a 32.8 billion parameter reasoning-optimized language model developed by Qwen as part of the Qwen2.5 family, designed to outperform conventional instruction-tuned models on complex tasks. Built with RoPE positional encoding, SwiGLU activations, RMSNorm, and Attention QKV bias, it excels in multi-turn conversation and long-form reasoning. It supports an extended context length of up to 131,072 tokens and incorporates supervised fine-tuning and reinforcement learning for enhanced instruction-following capabilities. The model is capable of structured thinking and delivers competitive performance against top models like DeepSeek-R1 and o1-mini. Recommended usage involves prompts starting with <think>\n, non-greedy sampling strategies, and support for standardized outputs on math and multiple-choice tasks. For long input handling, it supports YaRN (Yet another RoPE Namer) for context scaling.
Features
- 32.8B parameter causal language model with RoPE and SwiGLU
- Capable of reasoning and multi-step problem solving
- Extended 131k token context with YaRN support
- Reinforcement learning and supervised fine-tuning
- Structured thinking with <think>\n output formatting
- Highly competitive with state-of-the-art models
- JSON-style prompt support for standardized answers
- Apache-2.0 licensed and optimized for vLLM deployment