AutoGPTQ is an implementation of GPTQ (Quantized GPT) that optimizes large language models (LLMs) for faster inference by reducing their computational footprint while maintaining accuracy.

Features

  • Efficient quantization for large language models
  • Reduces memory usage without major performance loss
  • Supports various precision levels (e.g., 4-bit, 8-bit)
  • Compatible with Hugging Face Transformers
  • Accelerates inference on GPUs and CPUs
  • Helps deploy LLMs on resource-constrained hardware

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow AutoGPTQ

AutoGPTQ Web Site

Other Useful Business Software
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
Try Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of AutoGPTQ!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Natural Language Processing (NLP) Tool, Python LLM Inference Tool

Registered

2025-01-21