GPU benchmark testing graphics performance with realistic 3D scenes.
MTEB: Massive Text Embedding Benchmark
A simple generic set type for the Go language
LongBench v2 and LongBench (ACL 25'&24')
A.S.E (AICGSecEval) is a repository-level AI-generated code security
Meta Agents Research Environments is a comprehensive platform
Performance monitoring and benchmarking suite
Test-Time Reinforcement Learning
The first large-scale public benchmark dataset for image harmonization
Drill is an HTTP load testing application written in Rust
GPU stress test OpenGL and Vulkan graphics benchmark Windows/Linux
Optimize your code automatically with AI
A simple and powerful proxy
Benchmark LLMs by fighting in Street Fighter 3
Provider-agnostic, open-source evaluation infrastructure
Minimal examples of data structures and algorithms in Python
GoNB, a Go Notebook Kernel for Jupyter
MiniMax-M2, a model built for Max coding & agentic workflows
Code for Cicero, an AI agent that plays the game of Diplomacy
Open-weight, large-scale hybrid-attention reasoning model
Open source codebase for Scale Agentex
Clean and efficient FP8 GEMM kernels with fine-grained scaling
ICLR2024 Spotlight: curation/training code, metadata, distribution