A high-throughput and memory-efficient inference and serving engine
High-performance inference framework for large language models
950 line, minimal, extensible LLM inference engine built from scratch
A lightweight vLLM implementation built from scratch
Inference Llama 2 in one file of pure C
Universal LLM Deployment Engine with ML Compilation
Parallax is a distributed model serving framework
LightLLM is a Python-based LLM (Large Language Model) inference
Tensor search for humans