llama2.c is a minimalist implementation of the Llama 2 language model architecture designed to run entirely in pure C. Created by Andrej Karpathy, this project offers an educational and lightweight framework for performing inference on small Llama 2 models without external dependencies. It provides a full training and inference pipeline: models can be trained in PyTorch and later executed using a concise 700-line C program (run.c). While it can technically load Meta’s official Llama 2 models, current support is limited to fp32 precision, meaning practical use is capped at models up to around 7B parameters. The goal of llama2.c is to demonstrate how a compact and transparent implementation can perform meaningful inference even with small models, emphasizing simplicity, clarity, and accessibility. The project builds upon lessons from nanoGPT and takes inspiration from llama.cpp, focusing instead on minimalism and educational value over large-scale performance.
Features
- Implements the full Llama 2 architecture for both training and inference
- Provides a compact, 700-line C-based inference engine (run.c)
- Allows training in PyTorch and running models directly in C
- Supports fp32 model precision for smaller, educational-scale LLMs
- Offers a clean, dependency-free implementation for easy study and modification
- Inspired by llama.cpp but designed for simplicity and minimalism