Port of Facebook's LLaMA model in C/C++
Inference Llama 2 in one file of pure C
Run models like Kimi-K2.5, GLM-5, DeepSeek, gpt-oss, Gemma, Qwen etc.
The media player for language learning, with dual subtitles
Run Local LLMs on Any Device. Open-source
LLM inference in C/C++
Distribute and run LLMs with a single file
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Open-source large language model family from Tencent Hunyuan
Run a 1-billion parameter LLM on a $10 board with 256MB RAM
Run PyTorch LLMs locally on servers, desktop and mobile
AI-powered bridge connecting LLMs and advanced AI agents
Emscripten: An LLVM-to-WebAssembly Compiler
Fast Multimodal LLM on Mobile Devices
High-speed Large Language Model Serving for Local Deployment
Research project. A Memory solution for users, teams, and applications
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Ongoing research training transformer models at scale
Production ready toolkit to run AI locally
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
Integrate cutting-edge LLM technology quickly and easily into your app
LLM training in simple, raw C/CUDA
Alibaba's high-performance LLM inference engine for diverse apps
Mooncake is the serving platform for Kimi