minGPT is a minimalist, educational re-implementation of the GPT (Generative Pretrained Transformer) architecture built in PyTorch, designed by Andrej Karpathy to expose the core structure of a transformer-based language model in as few lines of code as possible. It strips away extraneous bells and whistles, aiming to show how a sequence of token indices is fed into a stack of transformer blocks and then decoded into the next token probabilities, with both training and inference supported. Because the whole model is around 300 lines of code, users can follow each step—from embedding lookup, positional encodings, multi-head attention, feed-forward layers, to output heads—and thus demystify how GPT-style models work beneath the surface. It provides a practical sandbox for experimentation, letting learners tweak the architecture, dataset, or training loop without being overwhelmed by framework abstraction.
Features
- Compact PyTorch implementation of a GPT-style transformer (~300 LOC)
- Training and inference routines included so you can build and sample from a model
- Clear architecture: embedding, positional encoding, multi-head attention, feed-forward layers
- Easy to modify, extend, or refactor for experimentation
- Transparent code intended for educational use rather than production scale
- Supports toy datasets and proof-of-concept tasks for understanding sequence modeling