Find the local LLM that actually runs and performs best
ChatGLM2-6B: An Open Bilingual Chat LLM
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Agentic, Reasoning, and Coding (ARC) foundation models
A.S.E (AICGSecEval) is a repository-level AI-generated code security
Code for the paper "Evaluating Large Language Models Trained on Code"
LongBench v2 and LongBench (ACL 25'&24')
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Leaderboard Comparing LLM Performance at Producing Hallucinations
Benchmark LLMs by fighting in Street Fighter 3
A high-performance ML model serving framework, offers dynamic batching
Implement CPU from scratch and play with large model deployments
ChatGLM3 series: Open Bilingual Chat LLMs | Open Source Bilingual Chat
A Gym environment for web task automation
Advanced language and coding AI model
Open-source model for program synthesis
OpenCompass is an LLM evaluation platform
Capable of understanding text, audio, vision, video
The official repo of Qwen chat & pretrained large language model
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
Open-source evaluation toolkit of large multi-modality models (LMMs)
Training Large Language Model to Reason in a Continuous Latent Space
Gemma open-weight LLM library, from Google DeepMind
AI-Driven Exploration in the Space of Code
Hypernetworks that adapt LLMs for specific benchmark tasks