Inference Llama 2 in one file of pure C
Port of Facebook's LLaMA model in C/C++
Run models like Kimi-K2.5, GLM-5, DeepSeek, gpt-oss, Gemma, Qwen etc.
Next-gen AI+IoT framework for T2/T3/T5AI/ESP32/and more
Run a 1-billion parameter LLM on a $10 board with 256MB RAM
Llama 2 Everywhere (L2E)
Python bindings for the Transformer models implemented in C/C++
Locally run an Instruction-Tuned Chat-Style LLM