Python bindings for llama.cpp
Run Local LLMs on Any Device. Open-source
Inference Llama 2 in one file of pure C
Qwen3 is the large language model series developed by Qwen team
Performance-optimized AI inference on your GPUs
GLM-4 series: Open Multilingual Multimodal Chat LMs
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere
Chinese LLaMA & Alpaca large language model + local CPU/GPU training