A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Agentic, Reasoning, and Coding (ARC) foundation models
A.S.E (AICGSecEval) is a repository-level AI-generated code security
LongBench v2 and LongBench (ACL 25'&24')
Strong, Economical, and Efficient Mixture-of-Experts Language Model
Visual Causal Flow
Meta Agents Research Environments is a comprehensive platform
Code for running inference and finetuning with SAM 3 model
A TTS that fits in your CPU (and pocket)
Leaderboard Comparing LLM Performance at Producing Hallucinations
Benchmark LLMs by fighting in Street Fighter 3
Autonomous harness engineering
Provider-agnostic, open-source evaluation infrastructure
MOSS-TTS-Nano is an open-source multilingual tiny speech generation
A nearly-live implementation of OpenAI's Whisper
A high-quality rapid TTS voice cloning model
Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
A Modular Simulation Framework and Benchmark for Robot Learning
Collection of reference environments, offline reinforcement learning
Easy Docker setup for Stable Diffusion with user-friendly UI
Terminal-native coding agent powered by local LLMs
SAPIEN Manipulation Skill Framework
Clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models
Training neural networks on Apple Neural Engine via APIs