Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
Large Language Models (LLM)
Search Results

Search Results for "source testing unit testing"

x

Sort By:

Relevance

Clear All Filters

OS

BSD 21
ChromeOS 21
Linux 21
More...
Mac 21
Windows 21

Category

Artificial Intelligence 21
- Large Language Models (LLM) 21

License

OSI-Approved Open Source 21

Programming Language

Python 15
TypeScript 2
Java 1
JavaScript 1
More...
Rust 1

21 projects for "source testing unit testing" with 2 filters applied:

Large Language Models (LLM) BSD Clear Filters & Widen Search

Application Monitoring That Won't Slow Your App Down
AppSignal's Rust-based agent is lightweight and stable. Already running in thousands of production apps.

Full APM with errors, performance, logs, and uptime monitoring. 99.999% uptime SLA on the platform itself.

Start Free
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
1

Strix

Open-source AI hackers to find and fix your app’s vulnerabilities

Strix is an open source agent-driven security platform that uses autonomous AI agents to identify, investigate, and validate vulnerabilities in software applications. The system is designed to mimic the behavior of real attackers by executing dynamic testing and verifying findings through proof-of-concept exploitation. Unlike traditional vulnerability scanners that rely heavily on static analysis, Strix agents actively run code, probe systems, and attempt exploitation to confirm whether vulnerabilities are genuinely exploitable. ...

Downloads: 10 This Week

Last Update: 2026-03-23
See Project
2

FuzzyAI Fuzzer

A powerful tool for automated LLM fuzzing

...The framework can be integrated into development pipelines to continuously test AI APIs and detect weaknesses before deployment. FuzzyAI provides testing tools, datasets, and evaluation workflows that help researchers measure how well models resist harmful instructions or attempts to bypass safety mechanisms.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
3

Rogue

AI Agent Evaluator & Red Team Platform

...The system allows developers to define specific scenarios, expected outcomes, and business rules so that the framework can verify whether an agent behaves according to required policies. During testing, Rogue records conversations and produces detailed reports that explain whether the agent passed or failed each scenario. These reports include reasoning and evidence, helping developers understand why a particular failure occurred.

Downloads: 4 This Week

Last Update: 2026-04-29
See Project
4

Synthetic Data Generator

SDG is a specialized framework

...This makes the generated data suitable for tasks such as machine learning model training, testing software systems, sharing datasets across organizations, and conducting research without violating privacy regulations. The system supports multiple generation methods including statistical models, generative adversarial networks, and large language model–based synthesis. It also includes a data processing module capable of handling different data types, preprocessing columns, managing missing values, and converting formats automatically before model training.

Downloads: 8 This Week

Last Update: 2026-03-06
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.

Start Free
5

Claude Code Skills & Plugins Hub

270+ Claude Code plugins with 739 agent skills

Claude Code Plugins Plus Skills is a large open-source ecosystem of plugins and AI “skills” designed to extend the capabilities of Claude Code development agents. The repository functions as a marketplace-style collection of hundreds of plugins and specialized skills that enable Claude Code to perform complex development, automation, and operational tasks. These plugins cover a wide range of domains including DevOps automation, security testing, API debugging, infrastructure management, and AI workflow orchestration. ...

Downloads: 6 This Week

Last Update: 2 days ago
See Project
6

BruteForceAI

Advanced LLM-powered brute-force tool combining AI intelligence

BruteForceAI is an open-source security testing tool that applies large language models to the analysis of login forms and authentication flows in web applications. At a high level, the project uses AI to inspect HTML content, identify the relevant form elements, and automate selector discovery so that a tester does not need to hand-map every field before evaluation.

Downloads: 5 This Week

Last Update: 2026-03-09
See Project
7

LangWatch

The platform for LLM evaluations and AI agent testing

LangWatch is an open-source observability and monitoring platform designed to help developers evaluate and improve applications built with large language models. The platform provides tools for tracking model interactions, analyzing prompt behavior, and identifying issues such as hallucinations, latency problems, or unexpected responses. By collecting telemetry data from AI applications, LangWatch allows developers to understand how their systems perform in real-world usage scenarios. The...

Downloads: 2 This Week

Last Update: 5 days ago
See Project
8

AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

...These environments require agents to interpret instructions, take actions, and adapt their strategies based on feedback from the environment. AgentBench also includes an evaluation framework that measures success rates, rewards, and task completion performance across different agent implementations. By testing models across diverse scenarios, the benchmark highlights strengths and weaknesses in reasoning, long-term planning, and tool usage.

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
9

promptmap2

A security scanner for custom LLM applications

promptmap is an automated security scanner for custom LLM applications that focuses on prompt injection and related attack classes. The project supports both white-box and black-box testing, which means it can either run tests directly against a known model and system prompt configuration or attack an external HTTP endpoint without internal access. Its scanning workflow uses a dual-LLM architecture in which one model acts as the target being tested and another acts as a controller that evaluates whether an attack succeeded. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
Compliant and Reliable File Transfers Backed by Top Security Certifications
Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.

Start Free Trial
10

super-agent-party

All-in-one AI companion! Desktop girlfriend + virtual streamer

Super Agent Party is an open-source experimental framework designed to demonstrate collaborative multi-agent AI systems interacting within a shared environment. The project explores how multiple specialized AI agents can coordinate to solve complex tasks by communicating with each other and sharing information. Instead of relying on a single monolithic model, the framework organizes agents with different roles or capabilities that cooperate to achieve goals. Each agent may handle different...

Downloads: 11 This Week

Last Update: 2026-05-01
See Project
11

Agent Development Kit (ADK) for Java

An open-source, code-first Java toolkit

Google’s Agent Development Kit for Java is an open-source toolkit that helps developers design, evaluate, and deploy advanced AI agents using the Java programming language. The framework follows a code-first approach that treats agent development as a structured software engineering task rather than a collection of prompt scripts. It provides abstractions and tools that allow developers to create agents capable of executing complex workflows, calling tools, and interacting with external...

Downloads: 6 This Week

Last Update: 2026-04-27
See Project
12

Agent Behavior Monitoring

The open source post-building layer for agents

Agent Behavior Monitoring is an open-source framework designed to monitor, evaluate, and improve the behavior of AI agents operating in real or simulated environments. The system focuses on agent behavior monitoring by collecting interaction data and analyzing how agents perform across different scenarios and tasks. Developers can use the framework to observe agent actions in both online production environments and offline evaluation settings, making it useful for debugging and performance...

Downloads: 5 This Week

Last Update: 2026-04-09
See Project
13

Prometheus-Eval

Evaluate your LLM's response with Prometheus and GPT4

Prometheus-Eval is an open-source framework designed to evaluate the outputs of large language models using specialized evaluator models known as Prometheus. The project provides tools, datasets, and scripts that allow developers and researchers to measure the quality of LLM responses through automated scoring rather than relying solely on human evaluators. It implements an “LLM-as-a-judge” approach in which a dedicated language model analyzes instruction–response pairs and assigns scores or...

Downloads: 3 This Week

Last Update: 2026-03-09
See Project
14

Bedrock Chat

AWS-native chatbot using Bedrock

Bedrock Chat is a mirrored version of an open-source project that provides a conversational interface for interacting with large language models and AI services through a chat-style application. The project typically focuses on delivering a user interface that allows individuals or teams to communicate with AI models, manage conversations, and experiment with prompts and responses. Implementations like Bedrock Chat often integrate with model hosting platforms or APIs that provide access to...

Downloads: 3 This Week

Last Update: 2026-04-09
See Project
15

Hallucination Leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations

Hallucination Leaderboard is an open research project that tracks and compares the tendency of large language models to produce hallucinated or inaccurate information when generating summaries. The project provides a standardized benchmark that evaluates different models using a dedicated hallucination detection system known as the Hallucination Evaluation Model. Each model is tested on document summarization tasks to measure how often generated responses introduce information that is not...

Downloads: 1 This Week

Last Update: 2026-04-29
See Project
16

Hephaestus

Semi-Structured Agentic Framework. Workflows build themselves

Hephaestus is an open-source semi-structured agentic framework designed to orchestrate multiple AI agents working together on complex tasks. Instead of relying entirely on predefined workflows, the framework allows agents to dynamically create tasks as they explore a problem space. Developers define high-level phases such as analysis, implementation, and testing, while agents generate specific subtasks within those phases.

Downloads: 0 This Week

Last Update: 2026-03-15
See Project
17

Paddler

Open-source LLM load balancer and serving platform for hosting LLMs

Paddler is an open-source LLM infrastructure platform designed to deploy, manage, and scale large language models on private infrastructure. The system acts as a specialized load balancer and serving layer for language models, enabling organizations to run inference workloads without relying on external API providers. It supports running models locally through engines such as llama.cpp while distributing requests across multiple compute nodes to improve performance and reliability. The...

Downloads: 0 This Week

Last Update: 2026-04-30
See Project
18

AICGSecEval

A.S.E (AICGSecEval) is a repository-level AI-generated code security

AICGSecEval is an open-source benchmark framework designed to evaluate the security of code generated by artificial intelligence systems. The project was developed to address concerns that AI-assisted programming tools may produce insecure code containing vulnerabilities such as injection flaws or unsafe logic. The framework constructs evaluation tasks based on real-world software repositories and known vulnerability cases derived from CVE records. By simulating realistic development...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
19

Chinese-LLaMA-Alpaca-3

Chinese Llama-3 LLMs) developed from Meta Llama 3

...It includes scripts and tooling that let researchers or developers run training, fine-tuning, quantization, and deployment on local machines (CPU or GPU), making experimentation and testing accessible without requiring large clusters.

Downloads: 0 This Week

Last Update: 2026-01-15
See Project
20

Canopy

Retrieval Augmented Generation (RAG) framework

Canopy is an open-source retrieval-augmented generation (RAG) framework developed by Pinecone to simplify the process of building applications that combine large language models with external knowledge sources. The system provides a complete pipeline for transforming raw text data into searchable embeddings, storing them in a vector database, and retrieving relevant context for language model responses. It is designed to handle many of the complex components required for a RAG workflow,...

Downloads: 2 This Week

Last Update: 2026-03-10
See Project
21

Safety-Prompts

Chinese safety prompts for evaluating and improving the safety of LLMs

Safety-Prompts is an open-source repository that provides a curated collection of prompts designed to evaluate and improve the safety behavior of large language models. The project focuses primarily on safety testing scenarios relevant to Chinese language models, though the concepts can be applied to other languages and systems. The prompts are structured to test whether models generate outputs that align with human values and safety guidelines when faced with potentially harmful or sensitive requests. ...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project

Previous
You're on page 1
Next

Related Searches

wifi brute force

wordpress brute force tool

selinux

password brute force

bruteforce

brute force wifi

brute force seed

brute force android

automation testing

Related Categories

Artificial Intelligence

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise