Find the local LLM that actually runs and performs best
ChatGLM2-6B: An Open Bilingual Chat LLM
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Agentic, Reasoning, and Coding (ARC) foundation models
A.S.E (AICGSecEval) is a repository-level AI-generated code security
Code for the paper "Evaluating Large Language Models Trained on Code"
LongBench v2 and LongBench (ACL 25'&24')
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Leaderboard Comparing LLM Performance at Producing Hallucinations
Benchmark LLMs by fighting in Street Fighter 3
LLM inference in C/C++
A high-performance ML model serving framework, offers dynamic batching
High-speed Large Language Model Serving for Local Deployment
Implement CPU from scratch and play with large model deployments
Run AI models locally on your machine with node.js bindings for llama
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
ChatGLM3 series: Open Bilingual Chat LLMs | Open Source Bilingual Chat
A Gym environment for web task automation
157 models, 30 providers, one command to find what runs on hardware
Advanced language and coding AI model
Private Open AI on Kubernetes
Open-source model for program synthesis
Fast, flexible LLM inference
OpenCompass is an LLM evaluation platform