GLM-4 is a family of open models from ZhipuAI that spans base, chat, and reasoning variants at both 32B and 9B scales, with long-context support and practical local-deployment options. The GLM-4-32B-0414 models are trained on ~15T high-quality data (including substantial synthetic reasoning data), then post-trained with preference alignment, rejection sampling, and reinforcement learning to improve instruction following, coding, function calling, and agent-style behaviors. The GLM-Z1-32B-0414 line adds deeper mathematical, coding, and logical reasoning via extended reinforcement learning and pairwise ranking feedback, while GLM-Z1-Rumination-32B-0414 introduces a “rumination” mode that performs longer, tool-using deep research for complex, open-ended tasks. A lightweight GLM-Z1-9B-0414 brings many of these techniques to a smaller model, targeting strong reasoning under tight resource budgets.
Features
- Model lineup: GLM-4-32B (Base/Chat), GLM-Z1-32B (Reasoning), GLM-Z1-Rumination-32B, and GLM-(Z1)-9B variants
- Long context: 32K native; guidance for YaRN rope scaling to reach up to 128K (and specific Z1 settings)
- Training pipeline: 15T pretraining plus preference alignment, rejection sampling, and RL to boost chat, code, and tool use
- Reasoning focus: Z1 models strengthened for math/code/logic; Rumination model supports deep, tool-assisted research workflows
- Implementations available/merged for vLLM, transformers, and llama.cpp; OpenAI-style API examples and prompt templates included
- Fine-tuning support with example scripts and requirements; guidance for resource-constrained inference and quantization