DeepSeek 4 Flash local inference engine for Metal
Collection of various algorithms in mathematics, machine learning
Port of Facebook's LLaMA model in C/C++
High-level, high-performance dynamic language for technical computing
Inference Llama 2 in one file of pure C
Run models like Kimi-K2.5, GLM-5, DeepSeek, gpt-oss, Gemma, Qwen etc.
The media player for language learning, with dual subtitles
Build your own AI friend
LLM inference in C/C++
Speech-to-text, text-to-speech, and speaker recognition
Run Local LLMs on Any Device. Open-source
Distribute and run LLMs with a single file
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
OpenVINO™ Toolkit repository
Open-source large language model family from Tencent Hunyuan
Run a 1-billion parameter LLM on a $10 board with 256MB RAM
Run PyTorch LLMs locally on servers, desktop and mobile
AI-powered bridge connecting LLMs and advanced AI agents
Emscripten: An LLVM-to-WebAssembly Compiler
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
Official inference framework for 1-bit LLMs
Code for Cicero, an AI agent that plays the game of Diplomacy
High-speed Large Language Model Serving for Local Deployment
Research project. A Memory solution for users, teams, and applications
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model