safety free download - SourceForge

52 projects for "safety" with 2 filters applied:

Artificial Intelligence BSD Clear Filters & Widen Search

Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.

Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.

Get Started Free
$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
1

gpt-oss-safeguard

Safety reasoning models built-upon gpt-oss

gpt-oss-safeguard is an open-weight reasoning model family released by OpenAI designed specifically for content safety and moderation tasks. Rather than just outputting a numeric “safety score,” it is trained to reason about content with respect to a user-provided policy, allowing flexible, customizable moderation definitions rather than fixed rules — ideal when different platforms have different safety standards. The model comes in at least two variants: a large 120B-parameter version for heavy-duty, high-accuracy reasoning, and a 20B-parameter version optimized for lower latency or smaller compute resources. ...

Downloads: 0 This Week

Last Update: 2026-01-14
See Project
2

Purple Llama

Set of tools to assess and improve LLM security

Purple Llama is an umbrella safety initiative that aggregates tools, benchmarks, and mitigations to help developers build responsibly with open generative AI. Its scope spans input and output safeguards, cybersecurity-focused evaluations, and reference shields that can be inserted at inference time. The project evolves as a hub for safety research artifacts like Llama Guard and Code Shield, along with dataset specs and how-to guides for integrating checks into applications. ...

Downloads: 0 This Week

Last Update: 2026-06-03
See Project
3

PKU Beaver

Constrained Value Alignment via Safe Reinforcement Learning

PKU Beaver is an open-source research project focused on improving the safety alignment of large language models through reinforcement learning from human feedback under explicit safety constraints. The framework introduces techniques that separate helpfulness and harmlessness signals during training, allowing models to optimize for useful responses while minimizing harmful behavior. To support this process, the project provides datasets containing human-labeled examples that encode both performance preferences and safety constraints across multiple dimensions. ...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
4

FuzzyAI Fuzzer

A powerful tool for automated LLM fuzzing

...FuzzyAI provides testing tools, datasets, and evaluation workflows that help researchers measure how well models resist harmful instructions or attempts to bypass safety mechanisms.

Downloads: 1 This Week

Last Update: 2026-03-09
See Project
AI Agents That Actually Do the Work
Assign real work to AI teammates that know your projects, priorities, and deadlines.

ClickUp's Super Agents run 24/7 inside your workspace: triaging bugs, drafting content, updating statuses, and routing tasks without being told twice. Connect them to 500+ tools and let them execute, not just suggest. Build custom agents in minutes that understand your workflows and act on them autonomously.

Try ClickUp Free
5

System Prompts Leaks

Collection of extracted System Prompts from popular chatbots

System Prompts Leaks is a curated repository that collects known leaked or publicly exposed system prompts used by large language models, organized so researchers, developers, and AI safety advocates can analyze them in one place. The project highlights how system prompts — instructions that strongly influence model behavior — have been inadvertently shared in forums, datasets, and open repositories, illustrating common patterns and potential vulnerabilities in prompt design and deployment. By aggregating these prompts, the repository serves as a valuable resource for understanding how widely different models are being guided in the wild, which helps with comparative analysis across architectures and service providers. ...

Downloads: 6 This Week

Last Update: 1 day ago
See Project
6

In-The-Wild Jailbreak Prompts on LLMs

A dataset consists of 15,140 ChatGPT prompts from Reddit

In-The-Wild Jailbreak Prompts on LLMs is an open-source research repository that provides datasets and analytical tools for studying jailbreak prompts used to bypass safety restrictions in large language models. The project is part of a research effort to understand how users attempt to circumvent alignment and safety mechanisms built into modern AI systems. The repository includes a large collection of prompts gathered from real-world platforms such as Reddit, Discord, prompt-sharing communities, and other public sources. ...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
7

Anthropic SDK TypeScript

Access to Anthropic's safety-first language model APIs

...Example usage shows how to instantiate the Anthropic client, call client.messages.create(...), and obtain responses. It supports streaming endpoints as well. Because TypeScript provides type safety, it helps avoid common errors in JSON interplay. The repo also includes documentation (API spec in api.md) and examples (e.g. streaming examples).

Downloads: 4 This Week

Last Update: 4 days ago
See Project
8

Swift Concurrency Agent Skill

Add expert Swift Concurrency guidance to your AI coding tool

...Rather than teaching basic Swift, it targets the nuanced behaviors of concurrency primitives, actor isolation, and safety annotations like @MainActor and Sendable. It also clarifies how to reason about structured tasks, cancellation, and performance trade-offs.

Downloads: 0 This Week

Last Update: 2026-05-04
See Project
9

Safety-Prompts

Chinese safety prompts for evaluating and improving the safety of LLMs

Safety-Prompts is an open-source repository that provides a curated collection of prompts designed to evaluate and improve the safety behavior of large language models. The project focuses primarily on safety testing scenarios relevant to Chinese language models, though the concepts can be applied to other languages and systems. The prompts are structured to test whether models generate outputs that align with human values and safety guidelines when faced with potentially harmful or sensitive requests. ...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
Stop vibe-debugging.
Plug Claude into your app's actual errors.

AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.

Free 30 days.
10

Eino

LLM application development framework for Go with agents and flows

...Eino also offers orchestration capabilities that allow components to be connected into chains, graphs, or workflows for complex AI pipelines. These orchestration features handle concerns such as concurrency, streaming responses, and type safety so developers can focus on application logic.

Downloads: 3 This Week

Last Update: 5 days ago
See Project
11

Claude Code Tools

Practical productivity tools for Claude Code, Codex-CLI

...Some components enable Claude Code to interact with terminal multiplexers such as tmux so that it can run programs, debug applications, and interact with scripts that require user input. The toolkit also provides safety mechanisms that prevent potentially dangerous shell commands from being executed automatically by AI agents.

Downloads: 4 This Week

Last Update: 5 days ago
See Project
12

ZAPI

ZAPI by Adopt AI is an open-source Python library

ZAPI is a developer-centric API framework that streamlines building, testing, and deploying APIs with strong type safety and minimal boilerplate, helping teams deliver backend services faster with fewer errors. It emphasizes a declarative router and schema model that uses types to define request and response formats, providing clear contracts for frontend and backend teams while automatically generating documentation. Zapi abstracts many repetitive tasks such as validation, authentication flows, and error handling so developers can focus on business logic instead of infrastructure plumbing. ...

Downloads: 1 This Week

Last Update: 2026-02-05
See Project
13

GitHub Agentic Workflows

GitHub Agentic Workflows

...By writing intent in markdown files, a developer can quickly generate .yml Actions workflows that perform tasks such as summarizing issues, automating triage, generating reports, or maintaining documentation, all without manually crafting YAML logic from scratch. The system emphasizes safety and guardrails, running agents in sandboxed environments with minimal permissions by default, and using “safe outputs” to constrain what the workflow can write back into the repository. It includes tooling for compiling, testing, and iterating on agentic workflows locally and integrates with GitHub’s existing Actions ecosystem.

Downloads: 2 This Week

Last Update: 16 hours ago
See Project
14

vLLM Semantic Router

System Level Intelligent Router for Mixture-of-Models at Cloud

...The router operates as an intelligent layer between users and model infrastructure, capturing signals from prompts, responses, and contextual data to improve decision-making. It can also integrate safety and monitoring mechanisms that detect issues such as jailbreak attempts, hallucinations, or sensitive information exposure.

Downloads: 0 This Week

Last Update: 2026-06-05
See Project
15

WorkAny

Desktop Agent for Any Task

...It acts as a unified environment where users can ask the AI to generate documents, presentations, websites, spreadsheets, organize files, or write code — all with real-time streaming outputs directly in the app, so you see results as the AI produces them. Powered by a combination of Claude Code as the primary runtime agent and a sandbox execution environment for safety, WorkAny integrates an agent SDK, MCP (Model Context Protocol) support, and custom skills to handle diverse tasks with contextual understanding. Users can connect multiple model providers, including OpenAI, OpenRouter, or custom endpoints, and WorkAny supports parallel task execution with asynchronous result viewing, enhancing productivity.

Downloads: 0 This Week

Last Update: 2026-03-07
See Project
16

Heretic

Fully automatic censorship removal for language models

Heretic is an open-source Python tool that automatically removes the built-in censorship or “safety alignment” from transformer-based language models so they respond to a broader range of prompts with fewer refusals. It works by applying directional ablation techniques and a parameter optimization strategy to adjust internal model behaviors without expensive post-training or altering the core capabilities. Designed for researchers and advanced users, Heretic makes it possible to study and experiment with uncensored model responses in a reproducible, automated way. ...

Downloads: 12 This Week

Last Update: 2026-06-14
See Project
17

Ralph AI Agent

AI agent loop that runs repeatedly until all PRD items are complete

...It provides a reactive loop where agents can repeatedly assess the current context, reason about the next best action using large language models, and execute actions across integrated tools and services. The runtime emphasizes safety boundaries by sandboxing operations, enforcing time and token limits, and isolating execution layers to prevent unpredictable side effects. Ralph also includes a built-in plugin system that lets developers attach custom tools, environment connectors, or monitoring hooks without modifying core logic. Designed for extensibility, the framework supports multi-model providers so agents can switch between models or fall back based on task needs. ...

Downloads: 0 This Week

Last Update: 2026-02-10
See Project
18

Inspect Petri

An alignment auditing agent capable of exploring alignment hypothesis

Inspect Petri is an open-source alignment auditing agent that lets researchers rapidly test concrete safety hypotheses against target models using realistic, multi-turn scenarios. Instead of building bespoke evals, Inspect Petri automatically generates audit environments from seed “special instructions,” orchestrates an auditor model to probe a target model, and simulates tool use and rollbacks to surface risky behaviors. Each interaction transcript is then scored by a judge model using a consistent rubric so results are comparable across runs and models. ...

Downloads: 0 This Week

Last Update: 2026-04-25
See Project
19

Starter Applets

Google AI Studio Starter Apps

...The applets are structured with a focus on simplicity: each presents a prompt input, minimal UI logic, and inline display of the resulting output or widget (e.g. generated text, images). They are built to illustrate best practices (e.g. safety guards, prompt templates, streaming UI updates) rather than production feature sets. The repo supplies a CLI or script to scaffold new applet templates, letting developers spin up small Gemini-powered components quickly. Each applet includes configuration parameters (API keys, model selection, prompt parameters) in a secure but flexible format. ...

Downloads: 0 This Week

Last Update: 2025-10-07
See Project
20

Groq AppGen

Project showcasing Llama 3.3 70B HTML codegen abilities

...For developers or non-coding designers alike, groq-appgen lowers the barrier to building full web interfaces or small apps by leveraging LLM-driven code generation rather than writing boilerplate by hand. It integrates safety/content-checking via LlamaGuard to catch undesirable outputs, and includes session management, export/share functionality, and history tracking so you can iterate on designs or revert as needed.

Downloads: 0 This Week

Last Update: 2025-12-12
See Project
21

Claude Code Action

Claude Code action for GitHub PRs

Claude Code Action is a general-purpose GitHub Action that brings Anthropic’s Claude Code into pull requests and issues to answer questions, review changes, and even implement code edits. It can wake up automatically when someone mentions @claude, when a PR or issue meets certain conditions, or when a workflow step provides an explicit prompt. The action is designed to understand diffs and surrounding context, so its comments and suggestions are grounded in what actually changed rather than...

Downloads: 11 This Week

Last Update: 2 days ago
See Project
22

Plano

Delivery infrastructure for agentic apps

...Built on modern proxy technology and compatible with any language or AI framework, Plano enables developers to focus on core agent logic instead of infrastructure complexity. The system provides intelligent LLM routing APIs that support model agility, along with filter chains for safety, moderation, and memory hooks. It also exposes rich traces, metrics, and logs to support continuous improvement of agent behavior in production. Overall, Plano functions as delivery infrastructure for scalable, maintainable AI agent systems.

Downloads: 3 This Week

Last Update: 2026-06-15
See Project
23

VibeVoice

Open-source multi-speaker long-form text-to-speech model

...The model integrates a Qwen2.5-based large language model with a diffusion head to produce realistic acoustic details and capture conversational context. Training involved curriculum learning with increasing sequence lengths up to 65K tokens, allowing VibeVoice to handle very long dialogues effectively. Safety mechanisms include an audible disclaimer and imperceptible watermarking in all generated audio to mitigate misuse risks.

Downloads: 9 This Week

Last Update: 2026-05-06
See Project
24

Poco Claw

A more beautiful and easier-to-use alternative to OpenClaw

Poco Claw is an AI agent platform designed as a more user-friendly and visually polished alternative to traditional OpenClaw implementations. It focuses on improving usability by providing a modern web interface combined with enhanced interaction capabilities such as built-in messaging and project organization tools. The system operates on a sandboxed runtime, ensuring that tasks executed by the agent are isolated from the host environment, which improves security and reliability. It extends...

Downloads: 4 This Week

Last Update: 7 days ago
See Project
25

L1B3RT45

Harmless liberation prompts

...The project reflects the growing interest in prompt security, jailbreak testing, and model alignment research within the AI community. Its materials are often used by researchers and enthusiasts studying robustness, safety, and adversarial prompting dynamics. Because of its unconventional focus, it functions primarily as a research and exploration resource rather than a production tool. Overall, L1B3RT4S serves as a niche but widely referenced collection for studying advanced prompt manipulation patterns.

Downloads: 1 This Week

Last Update: 2026-03-02
See Project