Agentic, Reasoning, and Coding (ARC) foundation models
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
LongBench v2 and LongBench (ACL 25'&24')
A.S.E (AICGSecEval) is a repository-level AI-generated code security
Code for the paper "Evaluating Large Language Models Trained on Code"
Leaderboard Comparing LLM Performance at Producing Hallucinations
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Benchmark LLMs by fighting in Street Fighter 3
Advanced language and coding AI model
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
Capable of understanding text, audio, vision, video
The official repo of Qwen chat & pretrained large language model
ChatGLM2-6B: An Open Bilingual Chat LLM
Find the local LLM that actually runs and performs best
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Unleashing 10,000+ Word Generation from Long Context LLMs
Fast, flexible LLM inference
Open-source evaluation toolkit of large multi-modality models (LMMs)
Open-source model for program synthesis
Training Large Language Model to Reason in a Continuous Latent Space
Qwen-Image is a powerful image generation foundation model
MiniMax M2.1, a SOTA model for real-world dev & agents.
Test-Time Reinforcement Learning
A Gym environment for web task automation
Papers integrating knowledge graphs (KGs) and large language models