Agentic, Reasoning, and Coding (ARC) foundation models
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
LongBench v2 and LongBench (ACL 25'&24')
A.S.E (AICGSecEval) is a repository-level AI-generated code security
Code for the paper "Evaluating Large Language Models Trained on Code"
Leaderboard Comparing LLM Performance at Producing Hallucinations
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Designed for text embedding and ranking tasks
Benchmark LLMs by fighting in Street Fighter 3
Advanced language and coding AI model
ChatGLM2-6B: An Open Bilingual Chat LLM
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
The official repo of Qwen chat & pretrained large language model
Capable of understanding text, audio, vision, video
Training Large Language Model to Reason in a Continuous Latent Space
Find the local LLM that actually runs and performs best
Unleashing 10,000+ Word Generation from Long Context LLMs
Open-source model for program synthesis
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Test-Time Reinforcement Learning
A Gym environment for web task automation
Open-source evaluation toolkit of large multi-modality models (LMMs)
Knowledge Graph Generation from Any Text
AI-Driven Exploration in the Space of Code
Hypernetworks that adapt LLMs for specific benchmark tasks