Agentic, Reasoning, and Coding (ARC) foundation models
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
LongBench v2 and LongBench (ACL 25'&24')
A.S.E (AICGSecEval) is a repository-level AI-generated code security
Code for the paper "Evaluating Large Language Models Trained on Code"
Leaderboard Comparing LLM Performance at Producing Hallucinations
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Benchmark LLMs by fighting in Street Fighter 3
Designed for text embedding and ranking tasks
Advanced language and coding AI model
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
Capable of understanding text, audio, vision, video
Fast, flexible LLM inference
Open-source evaluation toolkit of large multi-modality models (LMMs)
ChatGLM2-6B: An Open Bilingual Chat LLM
Qwen-Image is a powerful image generation foundation model
MiniMax M2.1, a SOTA model for real-world dev & agents.
Test-Time Reinforcement Learning
A Gym environment for web task automation
Papers integrating knowledge graphs (KGs) and large language models
Open-source model for program synthesis
Knowledge Graph Generation from Any Text
AI-Driven Exploration in the Space of Code
Hypernetworks that adapt LLMs for specific benchmark tasks
Driving with Graph Visual Question Answering