Port of Facebook's LLaMA model in C/C++
Run Local LLMs on Any Device. Open-source
A @ClickHouse fork that supports high-performance vector search
Alibaba's high-performance LLM inference engine for diverse apps
An Easy-to-Use and High-Performance AI Deployment Framework
High-speed Large Language Model Serving for Local Deployment
Fast Multimodal LLM on Mobile Devices
LLM inference in C/C++
UCCL is an efficient communication library for GPUs
Mooncake is the serving platform for Kimi
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
TT-NN operator library, and TT-Metalium low level kernel programming