Grok-2.5 is a large-scale AI model developed and released by xAI in 2024, made available through Hugging Face for research and experimentation. The model is distributed as raw weights that require specialized infrastructure to run, rather than being hosted by inference providers. To use it, users must download over 500 GB of files and set them up locally with the SGLang inference engine. Grok-2.5 supports advanced inference with multi-GPU configurations, requiring at least 8 GPUs with more than 40 GB of memory each for optimal performance. It integrates with the SGLang framework to enable serving, testing, and chat-style interactions. The model comes with a post-training architecture and requires the correct chat template to function properly. It is released under the Grok 2 Community License Agreement, encouraging community experimentation and responsible use.
Features
- Trained and deployed by xAI in 2024
- Available for download via Hugging Face with ~500 GB of files
- Requires installation of SGLang inference engine (≥ v0.5.1)
- Supports tensor parallelism (TP=8) across 8 GPUs
- Includes FP8 quantization and Triton-based attention backend
- Post-trained model requiring specific chat templates for responses
- Community license allows research and non-commercial use
- Compatible with custom local inference server setups