Port of Facebook's LLaMA model in C/C++
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Inference framework for 1-bit LLMs
gpt-oss-120b and gpt-oss-20b are two open-weight language models
Code generation model trained on 80+ languages with FIM support