Mixtral-8x7B-Instruct-v0.1 is an instruction-tuned large language model developed by Mistral AI, based on a Sparse Mixture of Experts (MoE) architecture where only 2 of 8 expert models are active per forward pass. With a total of 46.7 billion parameters, it delivers the capabilities of a much larger model while remaining compute-efficient. Fine-tuned for multi-turn conversations, it follows a strict instruction formatting pattern using [INST] and [/INST] tags, and demonstrates superior performance over Llama 2 70B on several benchmarks. The model is accessible via Hugging Face Transformers and supports inference with tools like Flash Attention 2 and bitsandbytes for low-precision runs. It outputs coherent, contextually appropriate responses in up to 5 languages and is suitable for chat-based tasks in both research and production environments. However, it lacks built-in moderation or alignment safeguards, requiring external guardrails for safe deployment.
Features
- Sparse Mixture of Experts with 2-of-8 active experts
- 46.7B total parameters with 12.9B active per token
- Outperforms Llama 2 70B on many benchmarks
- Instruction-tuned for coherent multi-turn dialogue
- Efficient inference with Flash Attention 2 and bitsandbytes
- Supports Hugging Face Transformers and vLLM integration
- Openly licensed under Apache 2.0
- Outputs in 5 supported languages with conversational tone