Port of Facebook's LLaMA model in C/C++
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Hackable and optimized Transformers building blocks
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
FAIR Sequence Modeling Toolkit 2
Inference framework for 1-bit LLMs
Open-source large language model family from Tencent Hunyuan
FlashMLA: Efficient Multi-head Latent Attention Kernels