A microbenchmark support library
A command-line benchmarking tool
GPU benchmark testing graphics performance with realistic 3D scenes.
RandomX, KawPow, CryptoNight, AstroBWT and GhostRider unified miner
Agentic, Reasoning, and Coding (ARC) foundation models
A benchmarking framework for the Julia language
A Heterogeneous Benchmark for Information Retrieval
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
A simple disk benchmark software
MTEB: Massive Text Embedding Benchmark
Meta Agents Research Environments is a comprehensive platform
LongBench v2 and LongBench (ACL 25'&24')
A.S.E (AICGSecEval) is a repository-level AI-generated code security
CrystalMark Retro is a comprehensive benchmarking software.
Benchmarking synthetic data generation methods
Checks whether Kubernetes is deployed
A Fast and Easy to use microframework for the web
Code for running inference and finetuning with SAM 3 model
Autonomous harness engineering
Visual Causal Flow
Leaderboard Comparing LLM Performance at Producing Hallucinations
Geometric deep learning extension library for PyTorch
Code for the paper "Evaluating Large Language Models Trained on Code"
Integrates the JMH benchmarking framework with Gradle