Run Local LLMs on Any Device. Open-source
Universal LLM Deployment Engine with ML Compilation
157 models, 30 providers, one command to find what runs on hardware
AirLLM 70B inference with single 4GB GPU
TT-NN operator library, and TT-Metalium low level kernel programming
Fast Multimodal LLM on Mobile Devices
Parallax is a distributed model serving framework
High-performance Inference and Deployment Toolkit for LLMs and VLMs
Phi-3.5 for Mac: Locally-run Vision and Language Models
Clippy, now with some AI
Run a 1-billion parameter LLM on a $10 board with 256MB RAM
A high-performance inference engine for AI models
High-performance inference framework for large language models
Next-gen AI+IoT framework for T2/T3/T5AI/ESP32/and more
Tools for merging pretrained large language models
Course to get into Large Language Models (LLMs)
ChatGLM-6B: An Open Bilingual Dialogue Language Model
VS Code extension for LLM-assisted code/text completion
A straightforward method for training your LLM
WebAssembly binding for llama.cpp - Enabling on-browser LLM inference
The easiest way to use Ollama in .NET
Language-model investigation agent with a terminal UI
950 line, minimal, extensible LLM inference engine built from scratch
Go manage your Ollama models
LLM Finetuning with peft