llama2.c is a minimalist, end-to-end LLM toolkit that lets you train a Llama-2–style model in PyTorch and run inference with a single ~700-line C program (run.c). The project emphasizes simplicity and education: the Llama-2 architecture is hard-coded, there are no external C dependencies, and you can see the full forward pass plainly in C. Despite the tiny footprint, it’s “full-stack”: you can train small models (e.g., 15M/42M/110M params on TinyStories) and then sample tokens directly from...