Image-GPT is the official research code and models from OpenAI’s paper Generative Pretraining from Pixels. The project adapts GPT-2 to the image domain, showing that the same transformer architecture can model sequences of pixels without altering its fundamental structure. It provides scripts to download pretrained checkpoints of different model sizes (small, medium, large) trained on large-scale datasets and includes utilities for handling color quantization with a 9-bit palette. Researchers can use the code to sample new images, evaluate generative loss on datasets like ImageNet or CIFAR-10, and explore the impact of scaling on performance. While the repository is archived and provided as-is, it remains a valuable starting point for experimenting with autoregressive transformers applied directly to raw pixel data. By demonstrating GPT’s flexibility across modalities, Image-GPT influenced subsequent multimodal generative research.
Features
- GPT-2 adapted for autoregressive image modeling with pixel sequences
- Pretrained checkpoints for small, medium, and large models
- Dataset download utilities for ImageNet and CIFAR-10 in 9-bit color format
- Color clustering tools for quantization and decoding between 9-bit and RGB
- Sampling scripts for generating new images from trained checkpoints
- Evaluation scripts for reproducing generative loss benchmarks on datasets