Hermes 4 405B FP8 is a cutting-edge large language model developed by Nous Research, built on Llama-3.1-405B and optimized for frontier reasoning and alignment. It introduces a hybrid reasoning mode with explicit <think> segments, enabling the model to deliberate deeply when needed and switch to faster responses when desired. Post-training improvements include a vastly expanded corpus with ~60B tokens, boosting performance across math, code, STEM, logic, creativity, and structured outputs. The model is designed for schema adherence, producing valid JSON and repairing malformed outputs, making it highly suitable for tool use and function calling. Hermes 4 is engineered for superior steerability with reduced refusal rates, aligning responses to user values while preserving assistant quality. It achieves state-of-the-art results on RefusalBench, outperforming both closed and open models in balancing helpfulness with adaptability.
Features
- Based on Llama-3.1-405B with 406B parameters
- Hybrid reasoning mode with <think> deliberation segments
- Trained on ~5M samples / ~60B tokens post-training corpus
- State-of-the-art performance on RefusalBench benchmark
- Schema adherence and structured JSON outputs with error repair
- Supports function calling, tool use, and role-based chat formats
- Improved steerability with reduced refusal rates and user-aligned values
- Available in FP8, BF16, and GGUF quantized variants for flexible deployment