AutoGPTQ is an implementation of GPTQ (Quantized GPT) that optimizes large language models (LLMs) for faster inference by reducing their computational footprint while maintaining accuracy.
Features
- Efficient quantization for large language models
- Reduces memory usage without major performance loss
- Supports various precision levels (e.g., 4-bit, 8-bit)
- Compatible with Hugging Face Transformers
- Accelerates inference on GPUs and CPUs
- Helps deploy LLMs on resource-constrained hardware
License
MIT LicenseFollow AutoGPTQ
Other Useful Business Software
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of AutoGPTQ!