Qwen2.5-VL is the multimodal large language model series
Port of Facebook's LLaMA model in C/C++
The ChatGPT Retrieval Plugin lets you easily find personal documents
Analyze computation-communication overlap in V3/R1
Access to Anthropic's safety-first language model APIs
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Clean and efficient FP8 GEMM kernels with fine-grained scaling
Flux 2 image generation model pure C inference
Inference framework for 1-bit LLMs
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
Real-time behaviour synthesis with MuJoCo, using Predictive Control
FAIR Sequence Modeling Toolkit 2
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Foundational Models for State-of-the-Art Speech and Text Translation
Open-source large language model family from Tencent Hunyuan
FlashMLA: Efficient Multi-head Latent Attention Kernels
Open source large language model by Alibaba
llama.go is like llama.cpp in pure Golang
Locally run an Instruction-Tuned Chat-Style LLM
ChatGPT integration with Unity Editor
Learning embeddings for classification, retrieval and ranking
Learning Continuous Signed Distance Functions for Shape Representation
Hermes 4 FP8: hybrid reasoning Llama-3.1-405B model by Nous Research
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video