Showing 900 open source projects for "benchmark"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Stop vibe-debugging. Icon
    Stop vibe-debugging.

    Plug Claude into your app's actual errors.

    AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.
    Free 30 days.
  • 1
    Benchmark

    Benchmark

    A microbenchmark support library

    A library to benchmark code snippets, similar to unit tests.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    hyperfine

    hyperfine

    A command-line benchmarking tool

    A command-line benchmarking tool. Statistical analysis across multiple runs. Support for arbitrary shell commands. Constant feedback about the benchmark progress and current estimates. Warmup runs can be executed before the actual benchmark. Cache-clearing commands can be set up before each timing run. Statistical outlier detection to detect interference from other programs and caching effects. Export results to various formats: CSV, JSON, Markdown, AsciiDoc. Parameterized benchmarks (e.g. vary the number of threads). ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Superposition Benchmark (Unigine)

    Superposition Benchmark (Unigine)

    GPU benchmark testing graphics performance with realistic 3D scenes.

    Superposition Benchmark by Unigine is a powerful GPU stress-testing and benchmarking tool designed to evaluate graphics performance using the Unigine 2 Engine. It features advanced visuals, real-time lighting, and physics simulations to test DirectX and OpenGL performance. Superposition provides detailed results, including frame rates, GPU temperatures, and stability data.
    Downloads: 240 This Week
    Last Update:
    See Project
  • 4
    XMRig

    XMRig

    RandomX, KawPow, CryptoNight, AstroBWT and GhostRider unified miner

    High performance, open-source, cross-platform RandomX, KawPow, CryptoNight, and AstroBWT CPU/GPU miner, RandomX benchmark, and stratum proxy. XMRig is a high-performance, open-source, cross-platform RandomX, KawPow, CryptoNight, and AstroBWT unified CPU/GPU miner and RandomX benchmark. Official binaries are available for Windows, Linux, macOS, and FreeBSD. The preferred way to configure the miner is the JSON config file as it is more flexible and human-friendly.
    Downloads: 19 This Week
    Last Update:
    See Project
  • Atera - an All-in-one platform for IT management Icon
    Atera - an All-in-one platform for IT management

    Ideal for IT departments and MSPs (managed service providers)

    Your IT essentials, integrated & elevated. Take your IT management from automated to autonomous, download Atera's agent to start your free trial!
    Try Atera now
  • 5
    BenchmarkTools.jl

    BenchmarkTools.jl

    A benchmarking framework for the Julia language

    ...Under the hood, BenchmarkTrackers relied on Benchmarks for actual benchmark execution.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    BEIR

    BEIR

    A Heterogeneous Benchmark for Information Retrieval

    BEIR is a benchmark framework for evaluating information retrieval models across various datasets and tasks, including document ranking and question answering.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    GLM-4.6

    GLM-4.6

    Agentic, Reasoning, and Coding (ARC) foundation models

    ...Its reasoning capabilities have been strengthened, including improved tool usage during inference and more effective integration within agent frameworks. GLM-4.6 also enhances writing quality, producing outputs that better align with human preferences and role-playing scenarios. Benchmark evaluations demonstrate that it not only outperforms GLM-4.5 but also rivals leading global models such as DeepSeek-V3.1-Terminus and Claude Sonnet 4.
    Downloads: 39 This Week
    Last Update:
    See Project
  • 8
    CrystalDiskMark

    CrystalDiskMark

    A simple disk benchmark software

    A simple disk benchmark software.
    Leader badge
    Downloads: 119,283 This Week
    Last Update:
    See Project
  • 9
    AgentBench

    AgentBench

    A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

    AgentBench is an open-source benchmark designed to evaluate the capabilities of large language models when used as autonomous agents. Unlike traditional language model benchmarks that focus on static text tasks, AgentBench measures how models perform in interactive environments that require planning, reasoning, and decision-making. The benchmark includes multiple environments that simulate realistic scenarios such as web interaction, database querying, and problem solving tasks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10
    LongBench

    LongBench

    LongBench v2 and LongBench (ACL 25'&24')

    ...LongBench addresses this gap by providing datasets that require models to process and reason over long sequences of text across multiple tasks. The benchmark includes multiple categories such as single-document question answering, multi-document reasoning, summarization, long dialogue understanding, and code analysis. It supports bilingual evaluation in English and Chinese to assess multilingual capabilities across extended contexts. Newer versions of the benchmark introduce extremely long context windows ranging from thousands to millions of tokens, enabling researchers to test the limits of modern long-context models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    AICGSecEval

    AICGSecEval

    A.S.E (AICGSecEval) is a repository-level AI-generated code security

    AICGSecEval is an open-source benchmark framework designed to evaluate the security of code generated by artificial intelligence systems. The project was developed to address concerns that AI-assisted programming tools may produce insecure code containing vulnerabilities such as injection flaws or unsafe logic. The framework constructs evaluation tasks based on real-world software repositories and known vulnerability cases derived from CVE records.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Drill

    Drill

    Drill is an HTTP load testing application written in Rust

    Drill is an HTTP load-testing application written in Rust. The main goal for this project is to build a really lightweight tool as an alternative to other that require JVM and other stuff. You can write benchmark files, in YAML format, describing all the stuff you want to test. It was inspired by Ansible syntax because it is really easy to use and extend. As you can see, you can play with interpolations in different ways. This will let you specify a benchmark with different requests and dependencies between them. Right now, the easiest way to get drill is to go to the latest release page and download the binary file for your platform. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 13
    DeepSeek V2

    DeepSeek V2

    Strong, Economical, and Efficient Mixture-of-Experts Language Model

    ...The V2 model is expected to support more advanced features like better context window handling, more efficient inference, better performance on challenging tasks, and stronger alignment with human feedback. Because DeepSeek is pushing open-weight competition, this V2 iteration is meant to solidify its position in benchmark rankings and in developer adoption. The code in the repository may include description files, support for tool use or plug-in architectures, and artifacts showing fine-tuning or prompt templates.
    Downloads: 36 This Week
    Last Update:
    See Project
  • 14
    kube-bench

    kube-bench

    Checks whether Kubernetes is deployed

    kube-bench is a tool that checks whether Kubernetes is deployed securely by running the checks documented in the CIS Kubernetes Benchmark. Trivy, the all-in-one cloud-native security scanner, can be deployed as a Kubernetes Operator inside a cluster. Both, the Trivy CLI, and the Trivy Operator support CIS Kubernetes Benchmark scanning among several other features. There are multiple ways to run kube-bench. You can run kube-bench inside a pod, but it will need access to the host's PID namespace in order to check the running processes, as well as access to some directories on the host where config files and other files are stored.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    HumanEval

    HumanEval

    Code for the paper "Evaluating Large Language Models Trained on Code"

    ...The benchmark has become a standard for evaluating code generation models, including those in the Codex and GPT families. Researchers can use the dataset to run reproducible comparisons across models and track improvements in functional code synthesis. By focusing on correctness through execution, human-eval provides a rigorous and practical way to evaluate programming capabilities in AI systems.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    CrystalMark Retro

    CrystalMark Retro

    CrystalMark Retro is a comprehensive benchmarking software.

    ...It supports 32-bit (x86), 64-bit (x64/ARM64), many-core, and multilingual (48+ languages) systems, and can measure CPU, Disk, 2D graphics (GDI), and 3D graphics (OpenGL) performance with a single click. Benchmark results can be registered in CrystalMarkDB for centralized management of past results (account required: free of charge) and comparison with data registered by users around the world.
    Leader badge
    Downloads: 17,587 This Week
    Last Update:
    See Project
  • 17
    SmallCode

    SmallCode

    AI coding agent optimized for small LLMs. 87% benchmark

    ...Its workflow is built around terminal usage, which makes it suitable for developers who prefer command-line control and local project context. smallcode emphasizes efficient agent behavior, careful tool use, and benchmark-driven improvements for constrained models. Its main value is giving developers a compact coding-agent environment that treats small local models as first-class tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Meta Agents Research Environments (ARE)

    Meta Agents Research Environments (ARE)

    Meta Agents Research Environments is a comprehensive platform

    ...Unlike static benchmarks, ARE supports environments where agents must adapt to changes over time and reason over sequences of actions. It interacts with applications and faces uncertainty. The included Gaia2 benchmark offers 800 scenarios across multiple “universes”. It can test reasoning, memory, tool use, and adaptability. Integration with simulated applications/agent APIs (email, file system, etc.). Support for multiple AI model backends/providers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    ...It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents with rich spatial structure. The repository provides model code and inference scripts that let researchers and developers run and benchmark the system on both images and PDFs, with support for batch evaluation and optimized pipelines leveraging vLLM and transformers.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 20
    MTEB

    MTEB

    MTEB: Massive Text Embedding Benchmark

    ...This makes progress in the field difficult to track, as various models are constantly being proposed without proper evaluation. To solve this problem, we introduce the Massive Text Embedding Benchmark (MTEB). MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages. Through the benchmarking of 33 models on MTEB, we establish the most comprehensive benchmark of text embeddings to date. We find that no particular text embedding method dominates across all tasks. This suggests that the field has yet to converge on a universal text embedding method and scale it up sufficiently to provide state-of-the-art results on all embedding tasks.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    SAM 3

    SAM 3

    Code for running inference and finetuning with SAM 3 model

    ...This capability is grounded in a new data engine that automatically annotated over four million unique concepts, producing a massive open-vocabulary segmentation dataset and enabling the model to achieve 75–80% of human performance on the SA-CO benchmark, which itself spans 270K unique concepts.
    Downloads: 27 This Week
    Last Update:
    See Project
  • 22
    SDGym

    SDGym

    Benchmarking synthetic data generation methods

    The Synthetic Data Gym (SDGym) is a benchmarking framework for modeling and generating synthetic data. Measure performance and memory usage across different synthetic data modeling techniques – classical statistics, deep learning and more! The SDGym library integrates with the Synthetic Data Vault ecosystem. You can use any of its synthesizers, datasets or metrics for benchmarking. You also customize the process to include your own work. Select any of the publicly available datasets from the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    JMH Gradle Plugin

    JMH Gradle Plugin

    Integrates the JMH benchmarking framework with Gradle

    The JMH Gradle Plugin provides integration of the Java Microbenchmark Harness (JMH) into Gradle builds, enabling developers to write and run performance benchmarks directly in their projects. JMH is the de facto standard for writing accurate and reliable Java microbenchmarks, and this plugin automates tasks like generating benchmark sources, compiling them with the required JMH support classes, and packaging runnable benchmark jars. It simplifies the workflow by handling classpath setup and wiring Gradle tasks for running benchmarks. Developers can run benchmarks via Gradle commands, produce reports, and compare performance over time. This reduces the manual effort of setting up JMH, making performance testing a natural part of the development cycle. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    golang-set

    golang-set

    A simple generic set type for the Go language

    ...One common interface to both implementations, a nonthreadsafe implementation favoring performance, a threadsafe implementation favoring concurrent use. Feature complete set implementation modeled after Python's set implementation. Exhaustive unit-test and benchmark suite. This package is trusted by many companies and thousands of open-source packages. This package now fully supports generic syntax so you are now able to instantiate a collection for any comparable type object.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    Crow Framework

    Crow Framework

    A Fast and Easy to use microframework for the web

    ...Get started by installing Crow and building you first application. Or go through the guides if you're stuck somewhere. Easy Routing (similar to flask). Type-safe Handlers. Blazingly fast (see this benchmark and this benchmark). Built-in JSON support. Mustache-based templating library (crow::mustache). Header-only library (single header file available). Middleware support for extensions. HTTP/1.1 and Websocket support. Multi-part request and response support. Uses modern C++ (11/14).
    Downloads: 7 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Auth0 Logo