52 projects for "safety" with 2 filters applied:

  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 1
    gpt-oss-safeguard

    gpt-oss-safeguard

    Safety reasoning models built-upon gpt-oss

    gpt-oss-safeguard is an open-weight reasoning model family released by OpenAI designed specifically for content safety and moderation tasks. Rather than just outputting a numeric “safety score,” it is trained to reason about content with respect to a user-provided policy, allowing flexible, customizable moderation definitions rather than fixed rules — ideal when different platforms have different safety standards. The model comes in at least two variants: a large 120B-parameter version for heavy-duty, high-accuracy reasoning, and a 20B-parameter version optimized for lower latency or smaller compute resources. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Purple Llama

    Purple Llama

    Set of tools to assess and improve LLM security

    Purple Llama is an umbrella safety initiative that aggregates tools, benchmarks, and mitigations to help developers build responsibly with open generative AI. Its scope spans input and output safeguards, cybersecurity-focused evaluations, and reference shields that can be inserted at inference time. The project evolves as a hub for safety research artifacts like Llama Guard and Code Shield, along with dataset specs and how-to guides for integrating checks into applications. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    PKU Beaver

    PKU Beaver

    Constrained Value Alignment via Safe Reinforcement Learning

    PKU Beaver is an open-source research project focused on improving the safety alignment of large language models through reinforcement learning from human feedback under explicit safety constraints. The framework introduces techniques that separate helpfulness and harmlessness signals during training, allowing models to optimize for useful responses while minimizing harmful behavior. To support this process, the project provides datasets containing human-labeled examples that encode both performance preferences and safety constraints across multiple dimensions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    FuzzyAI Fuzzer

    FuzzyAI Fuzzer

    A powerful tool for automated LLM fuzzing

    ...FuzzyAI provides testing tools, datasets, and evaluation workflows that help researchers measure how well models resist harmful instructions or attempts to bypass safety mechanisms.
    Downloads: 1 This Week
    Last Update:
    See Project
  • AI Agents That Actually Do the Work Icon
    AI Agents That Actually Do the Work

    Assign real work to AI teammates that know your projects, priorities, and deadlines.

    ClickUp's Super Agents run 24/7 inside your workspace: triaging bugs, drafting content, updating statuses, and routing tasks without being told twice. Connect them to 500+ tools and let them execute, not just suggest. Build custom agents in minutes that understand your workflows and act on them autonomously.
    Try ClickUp Free
  • 5
    System Prompts Leaks

    System Prompts Leaks

    Collection of extracted System Prompts from popular chatbots

    System Prompts Leaks is a curated repository that collects known leaked or publicly exposed system prompts used by large language models, organized so researchers, developers, and AI safety advocates can analyze them in one place. The project highlights how system prompts — instructions that strongly influence model behavior — have been inadvertently shared in forums, datasets, and open repositories, illustrating common patterns and potential vulnerabilities in prompt design and deployment. By aggregating these prompts, the repository serves as a valuable resource for understanding how widely different models are being guided in the wild, which helps with comparative analysis across architectures and service providers. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    In-The-Wild Jailbreak Prompts on LLMs

    In-The-Wild Jailbreak Prompts on LLMs

    A dataset consists of 15,140 ChatGPT prompts from Reddit

    In-The-Wild Jailbreak Prompts on LLMs is an open-source research repository that provides datasets and analytical tools for studying jailbreak prompts used to bypass safety restrictions in large language models. The project is part of a research effort to understand how users attempt to circumvent alignment and safety mechanisms built into modern AI systems. The repository includes a large collection of prompts gathered from real-world platforms such as Reddit, Discord, prompt-sharing communities, and other public sources. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Anthropic SDK TypeScript

    Anthropic SDK TypeScript

    Access to Anthropic's safety-first language model APIs

    ...Example usage shows how to instantiate the Anthropic client, call client.messages.create(...), and obtain responses. It supports streaming endpoints as well. Because TypeScript provides type safety, it helps avoid common errors in JSON interplay. The repo also includes documentation (API spec in api.md) and examples (e.g. streaming examples).
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    Swift Concurrency Agent Skill

    Swift Concurrency Agent Skill

    Add expert Swift Concurrency guidance to your AI coding tool

    ...Rather than teaching basic Swift, it targets the nuanced behaviors of concurrency primitives, actor isolation, and safety annotations like @MainActor and Sendable. It also clarifies how to reason about structured tasks, cancellation, and performance trade-offs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Safety-Prompts

    Safety-Prompts

    Chinese safety prompts for evaluating and improving the safety of LLMs

    Safety-Prompts is an open-source repository that provides a curated collection of prompts designed to evaluate and improve the safety behavior of large language models. The project focuses primarily on safety testing scenarios relevant to Chinese language models, though the concepts can be applied to other languages and systems. The prompts are structured to test whether models generate outputs that align with human values and safety guidelines when faced with potentially harmful or sensitive requests. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop vibe-debugging. Icon
    Stop vibe-debugging.

    Plug Claude into your app's actual errors.

    AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.
    Free 30 days.
  • 10
    Eino

    Eino

    LLM application development framework for Go with agents and flows

    ...Eino also offers orchestration capabilities that allow components to be connected into chains, graphs, or workflows for complex AI pipelines. These orchestration features handle concerns such as concurrency, streaming responses, and type safety so developers can focus on application logic.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    Claude Code Tools

    Claude Code Tools

    Practical productivity tools for Claude Code, Codex-CLI

    ...Some components enable Claude Code to interact with terminal multiplexers such as tmux so that it can run programs, debug applications, and interact with scripts that require user input. The toolkit also provides safety mechanisms that prevent potentially dangerous shell commands from being executed automatically by AI agents.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    ZAPI

    ZAPI

    ZAPI by Adopt AI is an open-source Python library

    ZAPI is a developer-centric API framework that streamlines building, testing, and deploying APIs with strong type safety and minimal boilerplate, helping teams deliver backend services faster with fewer errors. It emphasizes a declarative router and schema model that uses types to define request and response formats, providing clear contracts for frontend and backend teams while automatically generating documentation. Zapi abstracts many repetitive tasks such as validation, authentication flows, and error handling so developers can focus on business logic instead of infrastructure plumbing. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    GitHub Agentic Workflows

    GitHub Agentic Workflows

    GitHub Agentic Workflows

    ...By writing intent in markdown files, a developer can quickly generate .yml Actions workflows that perform tasks such as summarizing issues, automating triage, generating reports, or maintaining documentation, all without manually crafting YAML logic from scratch. The system emphasizes safety and guardrails, running agents in sandboxed environments with minimal permissions by default, and using “safe outputs” to constrain what the workflow can write back into the repository. It includes tooling for compiling, testing, and iterating on agentic workflows locally and integrates with GitHub’s existing Actions ecosystem.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    vLLM Semantic Router

    vLLM Semantic Router

    System Level Intelligent Router for Mixture-of-Models at Cloud

    ...The router operates as an intelligent layer between users and model infrastructure, capturing signals from prompts, responses, and contextual data to improve decision-making. It can also integrate safety and monitoring mechanisms that detect issues such as jailbreak attempts, hallucinations, or sensitive information exposure.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    WorkAny

    WorkAny

    Desktop Agent for Any Task

    ...It acts as a unified environment where users can ask the AI to generate documents, presentations, websites, spreadsheets, organize files, or write code — all with real-time streaming outputs directly in the app, so you see results as the AI produces them. Powered by a combination of Claude Code as the primary runtime agent and a sandbox execution environment for safety, WorkAny integrates an agent SDK, MCP (Model Context Protocol) support, and custom skills to handle diverse tasks with contextual understanding. Users can connect multiple model providers, including OpenAI, OpenRouter, or custom endpoints, and WorkAny supports parallel task execution with asynchronous result viewing, enhancing productivity.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Heretic

    Heretic

    Fully automatic censorship removal for language models

    Heretic is an open-source Python tool that automatically removes the built-in censorship or “safety alignment” from transformer-based language models so they respond to a broader range of prompts with fewer refusals. It works by applying directional ablation techniques and a parameter optimization strategy to adjust internal model behaviors without expensive post-training or altering the core capabilities. Designed for researchers and advanced users, Heretic makes it possible to study and experiment with uncensored model responses in a reproducible, automated way. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 17
    Ralph AI Agent

    Ralph AI Agent

    AI agent loop that runs repeatedly until all PRD items are complete

    ...It provides a reactive loop where agents can repeatedly assess the current context, reason about the next best action using large language models, and execute actions across integrated tools and services. The runtime emphasizes safety boundaries by sandboxing operations, enforcing time and token limits, and isolating execution layers to prevent unpredictable side effects. Ralph also includes a built-in plugin system that lets developers attach custom tools, environment connectors, or monitoring hooks without modifying core logic. Designed for extensibility, the framework supports multi-model providers so agents can switch between models or fall back based on task needs. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Inspect Petri

    Inspect Petri

    An alignment auditing agent capable of exploring alignment hypothesis

    Inspect Petri is an open-source alignment auditing agent that lets researchers rapidly test concrete safety hypotheses against target models using realistic, multi-turn scenarios. Instead of building bespoke evals, Inspect Petri automatically generates audit environments from seed “special instructions,” orchestrates an auditor model to probe a target model, and simulates tool use and rollbacks to surface risky behaviors. Each interaction transcript is then scored by a judge model using a consistent rubric so results are comparable across runs and models. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Starter Applets

    Starter Applets

    Google AI Studio Starter Apps

    ...The applets are structured with a focus on simplicity: each presents a prompt input, minimal UI logic, and inline display of the resulting output or widget (e.g. generated text, images). They are built to illustrate best practices (e.g. safety guards, prompt templates, streaming UI updates) rather than production feature sets. The repo supplies a CLI or script to scaffold new applet templates, letting developers spin up small Gemini-powered components quickly. Each applet includes configuration parameters (API keys, model selection, prompt parameters) in a secure but flexible format. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Groq AppGen

    Groq AppGen

    Project showcasing Llama 3.3 70B HTML codegen abilities

    ...For developers or non-coding designers alike, groq-appgen lowers the barrier to building full web interfaces or small apps by leveraging LLM-driven code generation rather than writing boilerplate by hand. It integrates safety/content-checking via LlamaGuard to catch undesirable outputs, and includes session management, export/share functionality, and history tracking so you can iterate on designs or revert as needed.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Claude Code Action

    Claude Code Action

    Claude Code action for GitHub PRs

    Claude Code Action is a general-purpose GitHub Action that brings Anthropic’s Claude Code into pull requests and issues to answer questions, review changes, and even implement code edits. It can wake up automatically when someone mentions @claude, when a PR or issue meets certain conditions, or when a workflow step provides an explicit prompt. The action is designed to understand diffs and surrounding context, so its comments and suggestions are grounded in what actually changed rather than...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 22
    Plano

    Plano

    Delivery infrastructure for agentic apps

    ...Built on modern proxy technology and compatible with any language or AI framework, Plano enables developers to focus on core agent logic instead of infrastructure complexity. The system provides intelligent LLM routing APIs that support model agility, along with filter chains for safety, moderation, and memory hooks. It also exposes rich traces, metrics, and logs to support continuous improvement of agent behavior in production. Overall, Plano functions as delivery infrastructure for scalable, maintainable AI agent systems.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    VibeVoice

    VibeVoice

    Open-source multi-speaker long-form text-to-speech model

    ...The model integrates a Qwen2.5-based large language model with a diffusion head to produce realistic acoustic details and capture conversational context. Training involved curriculum learning with increasing sequence lengths up to 65K tokens, allowing VibeVoice to handle very long dialogues effectively. Safety mechanisms include an audible disclaimer and imperceptible watermarking in all generated audio to mitigate misuse risks.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 24
    Poco Claw

    Poco Claw

    A more beautiful and easier-to-use alternative to OpenClaw

    Poco Claw is an AI agent platform designed as a more user-friendly and visually polished alternative to traditional OpenClaw implementations. It focuses on improving usability by providing a modern web interface combined with enhanced interaction capabilities such as built-in messaging and project organization tools. The system operates on a sandboxed runtime, ensuring that tasks executed by the agent are isolated from the host environment, which improves security and reliability. It extends...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    L1B3RT45

    L1B3RT45

    Harmless liberation prompts

    ...The project reflects the growing interest in prompt security, jailbreak testing, and model alignment research within the AI community. Its materials are often used by researchers and enthusiasts studying robustness, safety, and adversarial prompting dynamics. Because of its unconventional focus, it functions primarily as a research and exploration resource rather than a production tool. Overall, L1B3RT4S serves as a niche but widely referenced collection for studying advanced prompt manipulation patterns.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
Auth0 Logo