123 projects for "model based testing tool" with 2 filters applied:

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Host LLMs in Production With On-Demand GPUs Icon
    Host LLMs in Production With On-Demand GPUs

    NVIDIA L4 GPUs. 5-second cold starts. Scale to zero when idle.

    Deploy your model, get an endpoint, pay only for compute time. No GPU provisioning or infrastructure management required.
    Try Free
  • 1
    Cactus Needle

    Cactus Needle

    26m function call model that runs on incredibly small devices

    Needle is an experimental 26-million-parameter function-calling model designed to run on extremely small devices such as phones, watches, glasses, and low-power personal AI hardware. It is based on a Simple Attention Network architecture and was distilled from a much larger model to focus on fast, compact tool-use behavior. The project provides open weights, training details, dataset generation resources, and a playground for testing the model with custom tools. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    FuzzyAI Fuzzer

    FuzzyAI Fuzzer

    A powerful tool for automated LLM fuzzing

    FuzzyAI is an open-source fuzzing framework designed to test the security and reliability of large language model applications. The tool automates the process of generating adversarial prompts and input variations to identify vulnerabilities such as jailbreaks, prompt injections, or unsafe model responses. It allows developers and security researchers to systematically evaluate the robustness of LLM-based systems by simulating a wide range of malicious or unexpected inputs. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    PentestGPT

    PentestGPT

    Automated Penetration Testing Agentic Framework Powered by LLMs

    PentestGPT is an AI-powered autonomous penetration testing agent designed to perform intelligent, end-to-end security assessments using large language models. Published at USENIX Security 2024, it combines advanced reasoning with an agentic workflow to automate tasks traditionally handled by human pentesters. The platform supports multiple penetration testing categories, including web security, cryptography, reversing, forensics, privilege escalation, and binary exploitation. PentestGPT runs...
    Downloads: 543 This Week
    Last Update:
    See Project
  • 4
    CyberStrikeAI

    CyberStrikeAI

    CyberStrikeAI is an AI-native security testing platform built in Go

    ...It supports role-based testing, letting teams define security roles with tailored tool access and prompts, and includes a skills system that encapsulates specialized testing strategies that the AI can incorporate into its planning. Through comprehensive lifecycle management, results are tracked, aggregated, and visualized, with support for versioned persistence, search, and risk severity scoring.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    MiMo-V2-Flash

    MiMo-V2-Flash

    MiMo-V2-Flash: Efficient Reasoning, Coding, and Agentic Foundation

    MiMo-V2-Flash is a large Mixture-of-Experts language model designed to deliver strong reasoning, coding, and agentic-task performance while keeping inference fast and cost-efficient. It uses an MoE setup where a very large total parameter count is available, but only a smaller subset is activated per token, which helps balance capability with runtime efficiency. The project positions the model for workflows that require tool use, multi-step planning, and higher throughput, rather than only...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    SERA CLI

    SERA CLI

    A tool to use the Ai2 Open Coding Agents Soft-Verified Agents

    SERA CLI is a command-line tool created by AllenAI to enable developers to interact with the SERA (Soft-Verified Efficient Repository Agents) model family using Claude Code as the execution front end. It provides a convenient interface for deploying, testing, and using SERA models without needing to write scaffold code from scratch, acting as both a proxy and utility wrapper to simplify workflows that involve large agent models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    kMCP

    kMCP

    Kubernetes Controller for building, testing and deploying MCP servers

    KMCP is a companion toolchain for building, testing, and deploying MCP servers with a workflow that spans local development through Kubernetes production deployments. It includes a CLI for day-to-day development tasks like scaffolding new MCP projects, managing tools, building container images, and running an MCP server locally for validation. For cluster operations, it includes a Kubernetes controller that manages MCP server lifecycles using a dedicated Custom Resource Definition (CRD),...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    BruteForceAI

    BruteForceAI

    Advanced LLM-powered brute-force tool combining AI intelligence

    BruteForceAI is an open-source security testing tool that applies large language models to the analysis of login forms and authentication flows in web applications. At a high level, the project uses AI to inspect HTML content, identify the relevant form elements, and automate selector discovery so that a tester does not need to hand-map every field before evaluation. It combines that analysis layer with automated credential testing workflows, framing itself as a more adaptive alternative to...
    Downloads: 131 This Week
    Last Update:
    See Project
  • 9
    Synthetic Data Generator

    Synthetic Data Generator

    SDG is a specialized framework

    ...The platform enables developers and data scientists to create artificial datasets that preserve important relationships between variables without containing sensitive personal information. This makes the generated data suitable for tasks such as machine learning model training, testing software systems, sharing datasets across organizations, and conducting research without violating privacy regulations. The system supports multiple generation methods including statistical models, generative adversarial networks, and large language modelbased synthesis. It also includes a data processing module capable of handling different data types, preprocessing columns, managing missing values, and converting formats automatically before model training.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 10
    serve-sim

    serve-sim

    The `npx serve` of Apple Simulators

    ...It can run locally, over a LAN, or through a remote Mac with tunneling. The web UI streams the simulator and forwards clicks, enabling browser-based end-to-end testing and debugging. serve-sim is best suited for iOS development, agent testing, remote simulator access, and mobile UI automation workflows.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    VIPER

    VIPER

    AI-powered red team platform for adversary simulation toolkit

    Viper is a comprehensive red teaming and adversary simulation platform designed to support cybersecurity professionals in conducting advanced security assessments. It integrates a wide range of tools and capabilities required for penetration testing, post-exploitation, and attack simulation workflows into a unified environment. Viper emphasizes ease of use through a graphical interface, allowing users to manage complex operations without relying solely on command-line tools. It includes a...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Codex MCP Server

    Codex MCP Server

    MCP server wrapper for OpenAI Codex CLI

    Codex MCP Server is an open-source integration tool that allows AI development environments to access the capabilities of the OpenAI Codex command-line interface through the Model Context Protocol. The project acts as a bridge between AI assistants such as Claude Code and the Codex CLI, enabling those assistants to perform advanced coding operations using Codex as a backend engine. Through this architecture, developers can request tasks such as code explanation, refactoring, or analysis...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Heretic

    Heretic

    Fully automatic censorship removal for language models

    Heretic is an open-source Python tool that automatically removes the built-in censorship or “safety alignment” from transformer-based language models so they respond to a broader range of prompts with fewer refusals. It works by applying directional ablation techniques and a parameter optimization strategy to adjust internal model behaviors without expensive post-training or altering the core capabilities.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 14
    AgentBench

    AgentBench

    A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

    ...AgentBench also includes an evaluation framework that measures success rates, rewards, and task completion performance across different agent implementations. By testing models across diverse scenarios, the benchmark highlights strengths and weaknesses in reasoning, long-term planning, and tool usage.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    firerpa LAMDA

    firerpa LAMDA

    The most powerful Android RPA agent framework

    lamda is an Android RPA agent framework that provides visual remote desktop control and automation at scale, geared toward testing, automation validation, and device management. It exposes a clean UI to monitor and interact with connected devices and includes tooling to script actions reliably across apps and OS versions. The project emphasizes low-friction setup and powerful control primitives so teams can move from interactive validation to repeatable automation. A public wiki, releases,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    GLM-4.7

    GLM-4.7

    Advanced language and coding AI model

    GLM-4.7 is an advanced agent-oriented large language model designed as a high-performance coding and reasoning partner. It delivers significant gains over GLM-4.6 in multilingual agentic coding, terminal-based workflows, and real-world developer benchmarks such as SWE-bench and Terminal Bench 2.0. The model introduces stronger “thinking before acting” behavior, improving stability and accuracy in complex agent frameworks like Claude Code, Cline, and Roo Code. ...
    Downloads: 74 This Week
    Last Update:
    See Project
  • 17
    L1B3RT45

    L1B3RT45

    Harmless liberation prompts

    L1B3RT4S is a large prompt collection project focused on adversarial and “liberation-style” prompt engineering experiments for large language models. The repository gathers creative prompt patterns intended to explore model behavior boundaries, roleplay scenarios, and red-teaming techniques. It is positioned more as a prompt experimentation archive than a traditional software library, emphasizing the study of how instruction phrasing can influence AI outputs. The project reflects the growing interest in prompt security, jailbreak testing, and model alignment research within the AI community. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    GLM-4.6

    GLM-4.6

    Agentic, Reasoning, and Coding (ARC) foundation models

    GLM-4.6 is the latest iteration of Zhipu AI’s foundation model, delivering significant advancements over GLM-4.5. It introduces an extended 200K token context window, enabling more sophisticated long-context reasoning and agentic workflows. The model achieves superior coding performance, excelling in benchmarks and practical coding assistants such as Claude Code, Cline, Roo Code, and Kilo Code. Its reasoning capabilities have been strengthened, including improved tool usage during inference...
    Downloads: 73 This Week
    Last Update:
    See Project
  • 19
    G0DM0D3

    G0DM0D3

    LIBERATED AI CHAT

    G0DM0D3 is an experimental AI interaction framework designed to enable unrestricted or “liberated” conversational behavior in language models by altering how prompts and system instructions are structured. It is part of a broader ecosystem of projects that explore the boundaries of AI alignment, control, and prompt engineering. The tool provides a collection of prompt templates and interaction patterns that attempt to override or bypass built-in behavioral constraints in language models. It...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    ACI.dev

    ACI.dev

    Open platform connecting AI agents to tools via unified MCP server

    ACI is an open source platform designed to enable AI agents to interact with external tools through a unified and structured interface. It focuses on simplifying tool integration by connecting hundreds of pre-built services into agentic environments, allowing developers to avoid building custom API clients and authentication flows for each service. ACI provides intent-aware tool access, meaning agents can dynamically discover and use tools based on context rather than rigid configurations. It supports both direct function calling and a unified Model Context Protocol (MCP) server, offering flexibility in how integrations are exposed to AI systems. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21
    gpt-prompt-engineer

    gpt-prompt-engineer

    Experimental prompt optimization toolkit built around notebooks

    gpt-prompt-engineer is an experimental prompt optimization toolkit built around notebooks and LLM-assisted evaluation. It lets users describe a task, provide test cases, and generate many candidate prompts automatically. The system then tests those prompts against the examples and ranks their performance through an ELO-style scoring process. The repository includes versions for general prompt generation, classification tasks, Claude-based workflows, and model-to-model prompt conversion. It...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    ToolUniverse

    ToolUniverse

    Democratizing AI scientists with ToolUniverse

    ToolUniverse is a comprehensive open-source ecosystem designed to transform any large language model into an autonomous “AI scientist” capable of performing real scientific research tasks through structured tool interaction. It standardizes how AI systems discover, select, and execute tools by introducing a unified AI-Tool Interaction Protocol that allows models to seamlessly connect with hundreds of scientific resources, including machine learning models, datasets, APIs, and analytical packages. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Responsible AI Toolbox

    Responsible AI Toolbox

    Responsible AI Toolbox is a suite of tools providing model

    Responsible AI Toolbox is a software framework designed to help developers evaluate and improve the reliability, fairness, and transparency of machine learning systems. The project provides tools that assist in analyzing model behavior, detecting bias, improving robustness, and explaining predictions produced by AI systems. It is designed to integrate with common machine learning frameworks, especially PyTorch, allowing developers to apply responsible AI techniques within existing workflows....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Hallucination Leaderboard

    Hallucination Leaderboard

    Leaderboard Comparing LLM Performance at Producing Hallucinations

    Hallucination Leaderboard is an open research project that tracks and compares the tendency of large language models to produce hallucinated or inaccurate information when generating summaries. The project provides a standardized benchmark that evaluates different models using a dedicated hallucination detection system known as the Hallucination Evaluation Model. Each model is tested on document summarization tasks to measure how often generated responses introduce information that is not...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    agents-cli

    agents-cli

    CLI to turn coding assistants into expert at deploying AI agents

    agents-cli is a command-line tool developed to simplify the creation, management, and execution of AI agents directly from the terminal. It provides developers with a structured interface for defining agent behavior, configuring tools, and running workflows. The tool integrates with agent frameworks and supports modular extensions for adding new capabilities. It emphasizes productivity by enabling rapid iteration and testing of agent logic without complex setup. agents-cli is designed to fit into modern developer workflows, particularly those that rely on automation and scripting. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Auth0 Logo