Mistral Large 3 675B Instruct 2512 NVFP4 is a frontier-scale multimodal Mixture-of-Experts model featuring 675B total parameters and 41B active parameters, trained from scratch on 3,000 H200 GPUs. This NVFP4 checkpoint is a post-training-activation quantized version of the original instruct model, created through a collaboration between Mistral AI, vLLM, and Red Hat using llm-compressor. It retains the same instruction-tuned behavior as the FP8 model, making it ideal for production assistants, agentic workflows, scientific tasks, and long-context enterprise systems. The model integrates a 673B-parameter MoE language backbone with a 2.5B-parameter vision encoder, enabling rich multimodal analysis across text and images. Designed for efficient deployment, it runs on a single H100 or A100 node in NVFP4 while delivering performance similar to FP8 for short- and mid-context workloads.
Features
- Granular MoE architecture with 675B total and 41B active parameters
- NVFP4 post-training-activation quantization for reduced memory usage
- 2.5B-parameter vision encoder enabling advanced multimodal understanding
- Instruct-tuned behavior ideal for chat, agentic workflows, and enterprise assistants
- Deployable on a single H100 or A100 GPU node; falls back to Marlin FP4 on older hardware
- Supports dozens of languages including English, French, Spanish, Chinese, Japanese, and Arabic
- Strong system-prompt adherence with native function calling and JSON output
- 256k context window for long-document comprehension and retrieval-heavy workflows