Run Local LLMs on Any Device. Open-source
The free, Open Source alternative to OpenAI, Claude and others
Universal LLM Deployment Engine with ML Compilation
157 models, 30 providers, one command to find what runs on hardware
LLM Frontend for Power Users
AirLLM 70B inference with single 4GB GPU
TT-NN operator library, and TT-Metalium low level kernel programming
Fast Multimodal LLM on Mobile Devices
An Easy-to-Use and High-Performance AI Deployment Framework
Parallax is a distributed model serving framework
Performance-optimized AI inference on your GPUs
High-performance Inference and Deployment Toolkit for LLMs and VLMs
Accessible large language models via k-bit quantization for PyTorch
Accelerate local LLM inference and finetuning
Phi-3.5 for Mac: Locally-run Vision and Language Models
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
LLM inference in C/C++
Run AI models locally on your machine with node.js bindings for llama
Fast, flexible LLM inference
Clippy, now with some AI
Find the local LLM that actually runs and performs best
Run a 1-billion parameter LLM on a $10 board with 256MB RAM
A high-performance inference engine for AI models
High-performance inference framework for large language models
High-speed Large Language Model Serving for Local Deployment