Run models like Kimi-K2.5, GLM-5, DeepSeek, gpt-oss, Gemma, Qwen etc.
Port of Facebook's LLaMA model in C/C++
Next-gen AI+IoT framework for T2/T3/T5AI/ESP32/and more
Inference Llama 2 in one file of pure C
Run a 1-billion parameter LLM on a $10 board with 256MB RAM
Llama 2 Everywhere (L2E)
Python bindings for the Transformer models implemented in C/C++
Locally run an Instruction-Tuned Chat-Style LLM