Agentic, Reasoning, and Coding (ARC) foundation models
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
LongBench v2 and LongBench (ACL 25'&24')
A.S.E (AICGSecEval) is a repository-level AI-generated code security
Strong, Economical, and Efficient Mixture-of-Experts Language Model
Meta Agents Research Environments is a comprehensive platform
Visual Causal Flow
Code for running inference and finetuning with SAM 3 model
Leaderboard Comparing LLM Performance at Producing Hallucinations
Autonomous harness engineering
Collection of reference environments, offline reinforcement learning
Benchmark LLMs by fighting in Street Fighter 3
Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
Provider-agnostic, open-source evaluation infrastructure
Advanced language and coding AI model
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models
Easy-to-understand AI learning resources for beginners
A Modular Simulation Framework and Benchmark for Robot Learning
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
An experimental version of DeepSeek model
The repository provides code for running inference with SAM 2
The 100 line AI agent that solves GitHub issues
SAPIEN Manipulation Skill Framework
Unleashing 10,000+ Word Generation from Long Context LLMs
The Fastest LLM Gateway with built in OTel observability