NanoGPT is a minimalistic yet powerful reimplementation of GPT-style transformers created by Andrej Karpathy for educational and research use. It distills the GPT architecture into a few hundred lines of Python code, making it far easier to understand than large, production-scale implementations. The repo is organized with a training pipeline (dataset preprocessing, model definition, optimizer, training loop) and inference script so you can train a small GPT on text datasets like Shakespeare or custom corpora. It emphasizes readability and clarity: the training loop is cleanly written, and the code avoids heavy abstractions, letting students follow the architecture step by step. While simple, it can still train non-trivial models on modern GPUs and generate coherent text. The project has become widely used in tutorials, courses, and experiments for people learning how transformers work under the hood.
Features
- Compact GPT transformer implementation in plain Python/PyTorch
- Data preprocessing pipeline for text datasets (e.g. Shakespeare)
- Training loop with clear optimizer and scheduler setup
- Inference script for text generation after training
- Readable, educational codebase (few hundred lines)
- Supports running on modern GPUs for small to mid-sized models