Run Local LLMs on Any Device. Open-source
Universal LLM Deployment Engine with ML Compilation
AirLLM 70B inference with single 4GB GPU
Parallax is a distributed model serving framework
Performance-optimized AI inference on your GPUs
High-performance Inference and Deployment Toolkit for LLMs and VLMs
Accessible large language models via k-bit quantization for PyTorch
Accelerate local LLM inference and finetuning
Phi-3.5 for Mac: Locally-run Vision and Language Models
Find the local LLM that actually runs and performs best
Unified framework for building enterprise RAG pipelines
High-performance inference framework for large language models
Implement CPU from scratch and play with large model deployments
Tools for merging pretrained large language models
A straightforward method for training your LLM
OpenDAN is an open source Personal AI OS
ChatGLM-6B: An Open Bilingual Dialogue Language Model
Play ChatGPT and other LLM with Xiaomi AI Speaker
Language-model investigation agent with a terminal UI
950 line, minimal, extensible LLM inference engine built from scratch
State-of-the-art Parameter-Efficient Fine-Tuning
How to optimize some algorithm in cuda
LLM training in simple, raw C/CUDA
A simple, performant and scalable Jax LLM
Implementation for MatMul-free LM