benchmark test free download

Showing 154 open source projects for "benchmark test"

View related business solutions

Build Securely on Azure with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
1

Superposition Benchmark (Unigine)

GPU benchmark testing graphics performance with realistic 3D scenes.

Superposition Benchmark by Unigine is a powerful GPU stress-testing and benchmarking tool designed to evaluate graphics performance using the Unigine 2 Engine. It features advanced visuals, real-time lighting, and physics simulations to test DirectX and OpenGL performance. Superposition provides detailed results, including frame rates, GPU temperatures, and stability data.

Downloads: 101 This Week

Last Update: 2025-10-07
See Project
2

MTEB

MTEB: Massive Text Embedding Benchmark

Text embeddings are commonly evaluated on a small set of datasets from a single task not covering their possible applications to other tasks. It is unclear whether state-of-the-art embeddings on semantic textual similarity (STS) can be equally well applied to other tasks like clustering or reranking. This makes progress in the field difficult to track, as various models are constantly being proposed without proper evaluation. To solve this problem, we introduce the Massive Text Embedding...

Downloads: 5 This Week

Last Update: 2 days ago
See Project
3

Benchmark for EasyOS / Puppy

...Structure of the Benchmark The benchmark contains 8 sections, but only 3 of them actually measure performance. The real performance tests are: CPU / Compression test Filesystem test RAM write test All other sections provide system information, not performance measurements. 0) GENERAL SYSTEM INFO (Informational — no performance measurement) This section collects basic system information.

Downloads: 0 This Week

Last Update: 2026-03-27
See Project
4

golang-set

A simple generic set type for the Go language

...One common interface to both implementations, a nonthreadsafe implementation favoring performance, a threadsafe implementation favoring concurrent use. Feature complete set implementation modeled after Python's set implementation. Exhaustive unit-test and benchmark suite. This package is trusted by many companies and thousands of open-source packages. This package now fully supports generic syntax so you are now able to instantiate a collection for any comparable type object.

Downloads: 0 This Week

Last Update: 3 days ago
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
5

LongBench

LongBench v2 and LongBench (ACL 25'&24')

...LongBench addresses this gap by providing datasets that require models to process and reason over long sequences of text across multiple tasks. The benchmark includes multiple categories such as single-document question answering, multi-document reasoning, summarization, long dialogue understanding, and code analysis. It supports bilingual evaluation in English and Chinese to assess multilingual capabilities across extended contexts. Newer versions of the benchmark introduce extremely long context windows ranging from thousands to millions of tokens, enabling researchers to test the limits of modern long-context models.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
6

AICGSecEval

A.S.E (AICGSecEval) is a repository-level AI-generated code security

...By simulating realistic development scenarios, the benchmark assesses how well AI code generation systems handle security-sensitive programming tasks. AICGSecEval combines static and dynamic evaluation techniques to analyze generated code for vulnerabilities and functional correctness. The framework includes datasets, test cases, and evaluation metrics that measure how AI programming tools perform across multiple programming languages and vulnerability categories.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
7

Meta Agents Research Environments (ARE)

Meta Agents Research Environments is a comprehensive platform

...Unlike static benchmarks, ARE supports environments where agents must adapt to changes over time and reason over sequences of actions. It interacts with applications and faces uncertainty. The included Gaia2 benchmark offers 800 scenarios across multiple “universes”. It can test reasoning, memory, tool use, and adaptability. Integration with simulated applications/agent APIs (email, file system, etc.). Support for multiple AI model backends/providers.

Downloads: 0 This Week

Last Update: 4 days ago
See Project
8

Likwid

Performance monitoring and benchmarking suite

Likwid is a simple to install and use toolsuite of command line applications and a library for performance oriented programmers. It works for Intel, AMD, ARMv8 and POWER9 processors on the Linux operating system. There is additional support for Nvidia and AMD GPUs. There is support for ARMv7 and POWER8/9 but there is currently no test machine in our hands to test them properly.

Downloads: 4 This Week

Last Update: 2025-12-23
See Project
9

TTRL

Test-Time Reinforcement Learning

TTRL is an open-source framework for test-time reinforcement learning in large language models, with a particular focus on reasoning tasks where ground-truth labels are not available during inference. The project addresses the problem of how to generate useful reward signals from unlabeled test-time data, and its central insight is that common test-time scaling practices such as majority voting can be repurposed into reward estimates for online reinforcement learning. This makes the...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
10

Image Harmonization Dataset iHarmony4

The first large-scale public benchmark dataset for image harmonization

This repository provides the iHarmony4 dataset, which is a large-scale dataset designed for image harmonization tasks. Image harmonization involves adjusting the appearance of a foreground in a composite image so that it is consistent with the background (in color, tone, illumination, etc.). The iHarmony4 dataset comprises four sub-datasets (HCOCO, HAdobe5k, HFlickr, Hday2night), each making composite images by combining a foreground from one image with a background from another, along with...

Downloads: 6 This Week

Last Update: 2026-02-24
See Project
11

File Read Test

File Read Test is a tool that reads disk files or directories and stops on the first read error. A second tool, Quick Disk Test, fills a disk with test data and verifies that it can be read back without errors.

4 Reviews

Downloads: 41 This Week

Last Update: 2026-01-03
See Project
12

Drill

Drill is an HTTP load testing application written in Rust

Drill is an HTTP load-testing application written in Rust. The main goal for this project is to build a really lightweight tool as an alternative to other that require JVM and other stuff. You can write benchmark files, in YAML format, describing all the stuff you want to test. It was inspired by Ansible syntax because it is really easy to use and extend. As you can see, you can play with interpolations in different ways. This will let you specify a benchmark with different requests and dependencies between them. Right now, the easiest way to get drill is to go to the latest release page and download the binary file for your platform. ...

Downloads: 0 This Week

Last Update: 2025-12-29
See Project
13

FurMark

GPU stress test OpenGL and Vulkan graphics benchmark Windows/Linux

...This tool is particularly effective in generating high workloads that can significantly increase the temperature of the GPU, making it a useful utility for testing the stability and stress tolerance of graphics cards. By simulating demanding rendering tasks, FurMark serves as a comprehensive test for assessing the robustness and thermal performance of GPUs under extreme conditions. FurMark from Geeks3D is a free OpenGL benchmark tool for Windows that can also be used to check the stability of a graphics card thanks to a built-in stress test. FurMark rendering is designed to overheat the GPU making it a viral-like stability and stress test tool (also called GPU burner) for the graphics card.

Downloads: 330 This Week

Last Update: 2024-10-28
See Project
14

Codeflash

Optimize your code automatically with AI

Codeflash is a general-purpose optimizer for Python that uses advanced large language models (LLMs) to automatically generate, test, and benchmark multiple optimization ideas, then creates merge-ready pull requests with the best improvements for your code. Optimize an entire existing codebase by running codeflash --all. Automate optimizing all future code you will write by installing Codeflash as a GitHub action. Optimize a Python workflow python myscript.py end-to-end by running codeflash optimize myscript.py. ...

Downloads: 3 This Week

Last Update: 2026-04-02
See Project
15

spp

A simple and powerful proxy

Supported protocol: TCP, UDP, RUDP (Reliable UDP), RICMP (Reliable ICMP), RHTTP (Reliable HTTP), KCP, Quic. Support type: forward proxy, reverse agent, SOCKS5 forward agent, SOCKS5 reverse agent. Agreement and type can be freely combined. External agent agreement and internal forwarding protocols can freely combine. Support Shadowsocks plug-in, spp-shadowsocks-plugin, spp-shadowsocks-plugin-android.

Downloads: 0 This Week

Last Update: 2025-12-30
See Project
16

LLM Colosseum

Benchmark LLMs by fighting in Street Fighter 3

LLM-Colosseum is an experimental benchmarking framework designed to evaluate the capabilities of large language models through gameplay interactions rather than traditional text-based benchmarks. The system places language models inside the environment of the classic video game Street Fighter III, where they must interpret the game state and decide which actions to perform during combat. This setup creates a dynamic environment that tests reasoning, situational awareness, and decision-making...

Downloads: 0 This Week

Last Update: 2026-03-07
See Project
17

openbench

Provider-agnostic, open-source evaluation infrastructure

...It bundles dozens of evaluation suites — covering knowledge, reasoning, math, code, science, reading comprehension, long-context recall, graph reasoning, and more — so users don’t need to assemble disparate datasets themselves. With a simple CLI interface (e.g. bench eval <benchmark> --model <model-id>), you can quickly evaluate any model supported by Groq or other providers (OpenAI, Anthropic, HuggingFace, local models, etc.). openbench also supports private/local evaluations: you can integrate your own custom benchmarks or data (e.g. internal test suites, domain-specific tasks) to evaluate models in a privacy-preserving way.

Downloads: 0 This Week

Last Update: 2025-12-09
See Project
18

Pythonic Data Structures and Algorithms

Minimal examples of data structures and algorithms in Python

The Pythonic Data Structures and Algorithms repository by keon is a hands-on collection of implementations of classical data structures and algorithms written in Python. It offers working, often well-commented code for many standard algorithmic problems — from sorting/searching to graph algorithms, dynamic programming, data structures, and more — making it a valuable resource for learning and reference. For students preparing for technical interviews, self-learners brushing up on...

Downloads: 0 This Week

Last Update: 2026-02-18
See Project
19

GoNB

GoNB, a Go Notebook Kernel for Jupyter

Go is a compiled language, but with very fast compilation, that allows one to use it in a REPL (Read-Eval-Print-Loop) fashion, by inserting a "Compile" step in the middle of the loop -- so it's a Read-Compile-Run-Print-Loop — while still feeling very interactive. GoNB leverages that compilation speed to implement a full-featured (at least it's getting there) Jupyter notebook kernel. As a side benefit it works with packages that use CGO — although it won't parse C code in the cells, so it...

Downloads: 0 This Week

Last Update: 2025-12-15
See Project
20

MiniMax-M2

MiniMax-M2, a model built for Max coding & agentic workflows

MiniMax-M2 is an open-weight large language model designed specifically for high-end coding and agentic workflows while staying compact and efficient. It uses a Mixture-of-Experts (MoE) architecture with 230 billion total parameters but only 10 billion activated per token, giving it the behavior of a very large model at a fraction of the runtime cost. The model is tuned for end-to-end developer flows such as multi-file edits, compile–run–fix loops, and test-validated repairs across real...

Downloads: 0 This Week

Last Update: 2025-12-01
See Project
21

Diplomacy Cicero

Code for Cicero, an AI agent that plays the game of Diplomacy

The project is the codebase for an AI agent named Cicero developed by Facebook Research. It is designed to play the board game Diplomacy by combining open-domain natural language negotiation with strategic planning. The repository includes training code, model checkpoints, and infrastructure for both language modelling (via the ParlAI framework) and reinforcement learning for strategy agents. It supports two variants: Cicero (which handles full “press” negotiation) and Diplodocus (a variant...

Downloads: 3 This Week

Last Update: 2 days ago
See Project
22

MiniMax-M1

Open-weight, large-scale hybrid-attention reasoning model

MiniMax-M1 is presented as the world’s first open-weight, large-scale hybrid-attention reasoning model, designed to push the frontier of long-context, tool-using, and deeply “thinking” language models. It is built on the MiniMax-Text-01 foundation and keeps the same massive parameter budget, but reworks the attention and training setup for better reasoning and test-time compute scaling. Architecturally, it combines Mixture-of-Experts layers with lightning attention, enabling the model to...

Downloads: 0 This Week

Last Update: 2025-12-01
See Project
23

Agentex

Open source codebase for Scale Agentex

AgentEX is an open framework from Scale for building, running, and evaluating agentic workflows, with an emphasis on reproducibility and measurable outcomes rather than ad-hoc demos. It treats an “agent” as a composition of a policy (the LLM), tools, memory, and an execution runtime so you can test the whole loop, not just prompting. The repo focuses on structured experiments: standardized tasks, canonical tool interfaces, and logs that make it possible to compare models, prompts, and tool...

Downloads: 1 This Week

Last Update: 6 days ago
See Project
24

DeepGEMM

Clean and efficient FP8 GEMM kernels with fine-grained scaling

DeepGEMM is a specialized CUDA library for efficient, high-performance general matrix multiplication (GEMM) operations, with particular focus on low-precision formats such as FP8 (and experimental support for BF16). The library is designed to work cleanly and simply, avoiding overly templated or heavily abstracted code, while still delivering performance that rivals expert-tuned libraries. It supports both standard and “grouped” GEMMs, which is useful for architectures like Mixture of...

Downloads: 1 This Week

Last Update: 2 days ago
See Project
25

MetaCLIP

ICLR2024 Spotlight: curation/training code, metadata, distribution

MetaCLIP is a research codebase that extends the CLIP framework into a meta-learning / continual learning regime, aiming to adapt CLIP-style models to new tasks or domains efficiently. The goal is to preserve CLIP’s strong zero-shot transfer capability while enabling fast adaptation to domain shifts or novel class sets with minimal data and without catastrophic forgetting. The repository provides training logic, adaptation strategies (e.g. prompt tuning, adapter modules), and evaluation...

Downloads: 0 This Week

Last Update: 2025-10-07
See Project