Python-based research interface for blackbox
A microbenchmark support library
A list of open LLMs available for commercial use
A command-line benchmarking tool
RandomX, KawPow, CryptoNight, AstroBWT and GhostRider unified miner
Agentic, Reasoning, and Coding (ARC) foundation models
A Heterogeneous Benchmark for Information Retrieval
A benchmarking framework for the Julia language
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
A.S.E (AICGSecEval) is a repository-level AI-generated code security
Meta Agents Research Environments is a comprehensive platform
Code for the paper "Evaluating Large Language Models Trained on Code"
LongBench v2 and LongBench (ACL 25'&24')
MTEB: Massive Text Embedding Benchmark
Checks whether Kubernetes is deployed
A Fast and Easy to use microframework for the web
Visual Causal Flow
Benchmarking synthetic data generation methods
Integrates the JMH benchmarking framework with Gradle
Geometric deep learning extension library for PyTorch
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Drill is an HTTP load testing application written in Rust
Reference implementations of MLPerf™ training benchmarks
Leaderboard Comparing LLM Performance at Producing Hallucinations
Strong, Economical, and Efficient Mixture-of-Experts Language Model