Python-based research interface for blackbox
A microbenchmark support library
A list of open LLMs available for commercial use
RandomX, KawPow, CryptoNight, AstroBWT and GhostRider unified miner
A command-line benchmarking tool
Agentic, Reasoning, and Coding (ARC) foundation models
A Heterogeneous Benchmark for Information Retrieval
A benchmarking framework for the Julia language
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
A.S.E (AICGSecEval) is a repository-level AI-generated code security
MTEB: Massive Text Embedding Benchmark
Meta Agents Research Environments is a comprehensive platform
Code for the paper "Evaluating Large Language Models Trained on Code"
LongBench v2 and LongBench (ACL 25'&24')
Checks whether Kubernetes is deployed
A Fast and Easy to use microframework for the web
Benchmarking synthetic data generation methods
Integrates the JMH benchmarking framework with Gradle
Visual Causal Flow
Geometric deep learning extension library for PyTorch
Leaderboard Comparing LLM Performance at Producing Hallucinations
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Strong, Economical, and Efficient Mixture-of-Experts Language Model
A GPU overclock & undervolt tool for various Snapdragon chips
Extremely fast compression algorithm