performance testing free download

mistral.rs

Fast, flexible LLM inference

mistral.rs is a fast and flexible LLM inference engine implemented in Rust, designed to run and serve modern language models with an emphasis on performance and practical deployment. It provides multiple entry points for developers, including a CLI for running models locally and an HTTP server that exposes an OpenAI-compatible API surface for easy integration with existing clients. The project includes hardware-aware tooling that can benchmark a system and choose sensible quantization and device-mapping strategies, helping users get strong performance without manual tuning. ...

Downloads: 4 This Week

Last Update: 2026-04-02

See Project

AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

...These environments require agents to interpret instructions, take actions, and adapt their strategies based on feedback from the environment. AgentBench also includes an evaluation framework that measures success rates, rewards, and task completion performance across different agent implementations. By testing models across diverse scenarios, the benchmark highlights strengths and weaknesses in reasoning, long-term planning, and tool usage.

Downloads: 0 This Week

Last Update: 2026-03-05

See Project

Easy DataSet

A powerful tool for creating datasets for LLM fine-tuning

...The system includes automated question-generation capabilities, hierarchical label trees, and answer generation pipelines that use LLM APIs to produce coherent paired data with customizable templates. Beyond dataset creation, Easy-dataset also provides a built-in evaluation system with model testing and blind-test features, helping teams validate model performance using curated test sets.

Downloads: 9 This Week

Last Update: 2026-04-10

See Project

langrocks

Tools like web browser, computer access and code runner for LLMs

Langrocks is a programming language experimentation toolkit that enables developers to create, test, and optimize custom programming languages.

Downloads: 1 This Week

Last Update: 2024-11-21

See Project

Agent Behavior Monitoring

The open source post-building layer for agents

Agent Behavior Monitoring is an open-source framework designed to monitor, evaluate, and improve the behavior of AI agents operating in real or simulated environments. The system focuses on agent behavior monitoring by collecting interaction data and analyzing how agents perform across different scenarios and tasks. Developers can use the framework to observe agent actions in both online production environments and offline evaluation settings, making it useful for debugging and performance...

Downloads: 5 This Week

Last Update: 2026-04-09

See Project

LangWatch

The platform for LLM evaluations and AI agent testing

LangWatch is an open-source observability and monitoring platform designed to help developers evaluate and improve applications built with large language models. The platform provides tools for tracking model interactions, analyzing prompt behavior, and identifying issues such as hallucinations, latency problems, or unexpected responses. By collecting telemetry data from AI applications, LangWatch allows developers to understand how their systems perform in real-world usage scenarios. The...

Downloads: 2 This Week

Last Update: 5 days ago

See Project

Paddler

Open-source LLM load balancer and serving platform for hosting LLMs

Paddler is an open-source LLM infrastructure platform designed to deploy, manage, and scale large language models on private infrastructure. The system acts as a specialized load balancer and serving layer for language models, enabling organizations to run inference workloads without relying on external API providers. It supports running models locally through engines such as llama.cpp while distributing requests across multiple compute nodes to improve performance and reliability. The...

Downloads: 0 This Week

Last Update: 2026-04-30

See Project

Mosec

A high-performance ML model serving framework, offers dynamic batching

Mosec is a high-performance and flexible model-serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API.

Downloads: 1 This Week

Last Update: 2026-04-15

See Project

MiniMax-M2.5

State of the art LLM and coding model

MiniMax-M2.5 is a state-of-the-art foundation model extensively trained with reinforcement learning across hundreds of thousands of real-world environments. It delivers leading performance in coding, agentic tool use, search, and complex office workflows, achieving top benchmark scores such as 80.2% on SWE-Bench Verified and 76.3% on BrowseComp. Designed to reason efficiently and decompose tasks like an experienced architect, M2.5 plans features, structure, and system design before generating code. The model supports full-stack development across web, mobile, and desktop platforms, covering the entire lifecycle from system design to testing and code review. ...

Downloads: 11 This Week

Last Update: 2026-03-09

See Project

Hallucination Leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations

Hallucination Leaderboard is an open research project that tracks and compares the tendency of large language models to produce hallucinated or inaccurate information when generating summaries. The project provides a standardized benchmark that evaluates different models using a dedicated hallucination detection system known as the Hallucination Evaluation Model. Each model is tested on document summarization tasks to measure how often generated responses introduce information that is not...

Downloads: 1 This Week

Last Update: 2026-04-29

See Project

$Grade School Math$

Grade School Math

8.5K high quality grade school math problems

The grade-school-math repository (sometimes called GSM8K) is a curated dataset of 8,500 high-quality grade school math word problems intended for evaluating mathematical reasoning capabilities of language models. It is structured into 7,500 training problems and 1,000 test problems. These aren’t trivial exercises — many require multi-step reasoning, combining arithmetic operations, and handling intermediate steps (e.g. “If she sold half as many in May… how many in total?”). The problems are...

Downloads: 0 This Week

Last Update: 2025-10-03

See Project

Search Results for "performance testing"

Showing 11 open source projects for "performance testing"

mistral.rs

AgentBench

Easy DataSet

langrocks

Agent Behavior Monitoring

LangWatch

Paddler

Mosec

MiniMax-M2.5

Hallucination Leaderboard

Grade School Math

Search Results for "performance testing"

Showing 11 open source projects for "performance testing"

mistral.rs

AgentBench

Easy DataSet

langrocks

Agent Behavior Monitoring

LangWatch

Paddler

Mosec

MiniMax-M2.5

Hallucination Leaderboard

Grade School Math

Related Categories