Port of Facebook's LLaMA model in C/C++
Run Local LLMs on Any Device. Open-source
Optimizing inference proxy for LLMs
AI interface for tinkerers (Ollama, Haystack RAG, Python)
User-friendly AI Interface
A high-throughput and memory-efficient inference and serving engine
The free, Open Source alternative to OpenAI, Claude and others
Easiest and laziest way for building multi-agent LLMs applications
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
FlashInfer: Kernel Library for LLM Serving
A high-performance inference system for large language models
LLMs as Copilots for Theorem Proving in Lean
Framework which allows you transform your Vector Database
LLM.swift is a simple and readable library
Large Language Model Text Generation Inference
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
lightweight, standalone C++ inference engine for Google's Gemma models
20+ high-performance LLMs with recipes to pretrain, finetune at scale
An MLOps framework to package, deploy, monitor and manage models
Simplifies the local serving of AI models from any source
Serving system for machine learning models
AICI: Prompts as (Wasm) Programs
A library to communicate with ChatGPT, Claude, Copilot, Gemini
Framework that is dedicated to making neural data processing