AutoGPTQ is an implementation of GPTQ (Quantized GPT) that optimizes large language models (LLMs) for faster inference by reducing their computational footprint while maintaining accuracy.

Features

  • Efficient quantization for large language models
  • Reduces memory usage without major performance loss
  • Supports various precision levels (e.g., 4-bit, 8-bit)
  • Compatible with Hugging Face Transformers
  • Accelerates inference on GPUs and CPUs
  • Helps deploy LLMs on resource-constrained hardware

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow AutoGPTQ

AutoGPTQ Web Site

Other Useful Business Software
Forever Free Full-Stack Observability | Grafana Cloud Icon
Forever Free Full-Stack Observability | Grafana Cloud

Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
Create free account
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of AutoGPTQ!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Natural Language Processing (NLP) Tool, Python LLM Inference Tool

Registered

2025-01-21