How to Train Your GPT is an interactive textbook that teaches users how to build, train, and run a modern language model from scratch. It is written for learners with minimal machine-learning background, using simple explanations, commented code, and practical examples. The project covers the same broad family of architecture behind systems such as GPT-style models, LLaMA-style models, Claude-style systems, and Mistral-style models. It includes chapters and topic explainers on tokenizers, embeddings, attention, RoPE, RMSNorm, SwiGLU, KV cache, AdamW, mixed precision, training loops, and inference. The guide emphasizes writing every important component manually rather than only calling high-level APIs. Its purpose is to make the internals of language models understandable through runnable code and step-by-step explanations.
Features
- Interactive language-model training textbook
- Twelve-chapter learning structure
- Fully commented runnable code examples
- Coverage of Transformer and GPT internals
- Standalone explainers for major ML concepts
- Beginner-friendly explanations with engineering depth