Python-based research interface for blackbox
A microbenchmark support library
A list of open LLMs available for commercial use
A command-line benchmarking tool
RandomX, KawPow, CryptoNight, AstroBWT and GhostRider unified miner
A Heterogeneous Benchmark for Information Retrieval
Agentic, Reasoning, and Coding (ARC) foundation models
A benchmarking framework for the Julia language
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
A.S.E (AICGSecEval) is a repository-level AI-generated code security
Meta Agents Research Environments is a comprehensive platform
Code for the paper "Evaluating Large Language Models Trained on Code"
LongBench v2 and LongBench (ACL 25'&24')
Checks whether Kubernetes is deployed
MTEB: Massive Text Embedding Benchmark
A Fast and Easy to use microframework for the web
Visual Causal Flow
Benchmarking synthetic data generation methods
Integrates the JMH benchmarking framework with Gradle
Geometric deep learning extension library for PyTorch
Drill is an HTTP load testing application written in Rust
Leaderboard Comparing LLM Performance at Producing Hallucinations
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Reference implementations of MLPerf™ training benchmarks
A GPU overclock & undervolt tool for various Snapdragon chips