Port of Facebook's LLaMA model in C/C++
Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
FlashInfer: Kernel Library for LLM Serving
Optimizing inference proxy for LLMs
Library for OCR-related tasks powered by Deep Learning
AI interface for tinkerers (Ollama, Haystack RAG, Python)
User-friendly AI Interface
The AI-native (edge and LLM) proxy for agents
LLM.swift is a simple and readable library
The free, Open Source alternative to OpenAI, Claude and others
AICI: Prompts as (Wasm) Programs
A high-performance inference system for large language models
LLMs as Copilots for Theorem Proving in Lean
Framework which allows you transform your Vector Database
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
Large Language Model Text Generation Inference
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Build your chatbot within minutes on your favorite device
Easiest and laziest way for building multi-agent LLMs applications
lightweight, standalone C++ inference engine for Google's Gemma models
Simplifies the local serving of AI models from any source
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Easy-to-use Speech Toolkit including Self-Supervised Learning model