Showing 97 open source projects for "reliability"

View related business solutions
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    FuzzyAI Fuzzer

    FuzzyAI Fuzzer

    A powerful tool for automated LLM fuzzing

    FuzzyAI is an open-source fuzzing framework designed to test the security and reliability of large language model applications. The tool automates the process of generating adversarial prompts and input variations to identify vulnerabilities such as jailbreaks, prompt injections, or unsafe model responses. It allows developers and security researchers to systematically evaluate the robustness of LLM-based systems by simulating a wide range of malicious or unexpected inputs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    LangWatch

    LangWatch

    The platform for LLM evaluations and AI agent testing

    ...By collecting telemetry data from AI applications, LangWatch allows developers to understand how their systems perform in real-world usage scenarios. The platform includes dashboards that visualize model behavior, enabling teams to monitor trends in response quality and reliability over time. It also provides evaluation tools that allow developers to test prompts and compare outputs across different models or configurations. Through integration with popular AI development frameworks, LangWatch can be embedded directly into AI pipelines to provide continuous monitoring and evaluation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    LMOps

    LMOps

    General technology for enabling AI capabilities w/ LLMs and MLLMs

    ...It includes experimental tools and frameworks that help developers optimize prompts, design workflows for generative models, and manage the lifecycle of LLM-based systems. The initiative also investigates techniques for improving the reliability, scalability, and maintainability of applications powered by large models. By addressing challenges such as prompt engineering, evaluation strategies, and deployment infrastructure, LMOps aims to establish best practices for operating large language model systems in real-world environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    12-Factor Agents

    12-Factor Agents

    What are the principles we can use to build LLM-powered software

    12-Factor Agents is a conceptual engineering guide that defines a set of principles for building reliable, scalable, and maintainable LLM-powered applications. Inspired by the original Twelve-Factor App methodology, the project reframes best practices specifically for agentic systems and AI software. It outlines patterns such as treating prompts as first-class assets, owning the context window, and converting natural language into structured tool calls. The repository emphasizes operational...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Save Up to 91% on Cloud Compute With Spot VMs Icon
    Save Up to 91% on Cloud Compute With Spot VMs

    Automatic sustained-use discounts. One free VM per month. No negotiation needed.

    Run batch jobs at 60-91% off with Spot VMs. Long-running workloads get automatic discounts with sustained use.
    Try Free
  • 5
    TorchRL

    TorchRL

    A modular, primitive-first, python-first PyTorch library

    TorchRL is an open-source Reinforcement Learning (RL) library for PyTorch. TorchRL provides PyTorch and python-first, low and high-level abstractions for RL that are intended to be efficient, modular, documented, and properly tested. The code is aimed at supporting research in RL. Most of it is written in Python in a highly modular way, such that researchers can easily swap components, transform them, or write new ones with little effort.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Universal Commerce Protocol (UCP)

    Universal Commerce Protocol (UCP)

    The common language for platforms, agents and businesses.

    ...Its modular, capability-based architecture allows businesses to expose only what they support while remaining flexible and extensible. By leveraging existing industry standards for payments, identity, and security, UCP avoids reinventing the wheel while ensuring reliability and trust. The result is a developer-friendly, future-ready protocol that simplifies commerce integration at global scale.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Prompt Engineering Techniques

    Prompt Engineering Techniques

    Collection of tutorials for Prompt Engineering techniques

    ...It is intended for a wide audience, from beginners learning how to structure their first prompts to advanced practitioners optimizing stability, controllability, and reliability in production systems.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Fli

    Fli

    Google Flights MCP and Python Library

    Fli is a powerful Python library and command-line tool that provides direct programmatic access to Google Flights data through reverse-engineered API interactions rather than traditional web scraping. This approach enables faster, more reliable, and more stable access to flight information, avoiding the fragility associated with HTML parsing and UI changes. The library supports a wide range of flight search capabilities, including filtering by airline, departure time, number of stops, cabin...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Poco Claw

    Poco Claw

    A more beautiful and easier-to-use alternative to OpenClaw

    ...It focuses on improving usability by providing a modern web interface combined with enhanced interaction capabilities such as built-in messaging and project organization tools. The system operates on a sandboxed runtime, ensuring that tasks executed by the agent are isolated from the host environment, which improves security and reliability. It extends beyond simple chatbot functionality by supporting structured workflows, task planning modes, and multi-step execution pipelines. The platform also allows users to manage files and contexts directly within the interface, enabling more complex interactions with data and projects. It is built to make AI agent systems accessible to a broader audience, including users who may not be comfortable with command-line environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 10
    Open Gauss

    Open Gauss

    Project-scoped Lean workflow orchestrator from Math, Inc.

    Open Gauss is an enterprise-grade open-source relational database management system designed to handle large-scale data processing with high performance, reliability, and security. It is based on the PostgreSQL ecosystem but significantly extends its capabilities through architectural optimizations, AI-driven features, and enterprise-level enhancements. The database organizes data using the relational model, storing structured information in tables composed of rows and columns while supporting standard SQL for querying and management. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    HelixDB

    HelixDB

    Graph-vector database for building unified AI backends fast

    ...HelixDB also supports additional data formats such as key-value, document, and relational data, making it flexible for a wide range of backend architectures. A central feature of the project is its custom query language, HelixQL, which is fully type-safe and compiled to ensure reliability and correctness in production environments. HelixDB includes built-in capabilities for embeddings, vector search, keyword search, and graph traversal, which are particularly useful for retrieval-augmented generation and agent-based systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Guardrails

    Guardrails

    Framework for validating and controlling LLM outputs in AI apps

    ...Guardrails also supports generating structured data from language models, allowing developers to enforce schemas or type constraints on responses. A companion ecosystem known as a hub provides reusable validators that can be combined into input and output guards to address different reliability and safety concerns.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Advanced AI explainability for PyTorch

    Advanced AI explainability for PyTorch

    Advanced AI Explainability for computer vision

    ...The library supports a wide variety of tasks including image classification, object detection, semantic segmentation, and similarity analysis. It also provides metrics and evaluation tools that help measure the reliability and quality of the generated explanations. By integrating easily with PyTorch models, the library allows developers to diagnose model errors, detect biases in datasets, and improve model transparency.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Rogue

    Rogue

    AI Agent Evaluator & Red Team Platform

    Rogue is an open-source evaluation and red-team framework designed to test the reliability, safety, and policy compliance of AI agents. The platform automatically interacts with an AI agent by generating dynamic scenarios and multi-turn conversations that simulate real-world interactions. Instead of relying solely on static test scripts, Rogue uses an agent-as-a-judge architecture where one agent probes another agent to detect failures or unexpected behaviors.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    MiroFlow

    MiroFlow

    Agent framework that enables tool-use agent tasks

    ...This architecture allows agents to perform advanced reasoning tasks such as deep research, future event prediction, and multi-step knowledge analysis. The framework emphasizes reliability and scalability by incorporating robust workflow execution, concurrency management, and fault-tolerant design to handle unstable APIs or network conditions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    WFGY 3.0

    WFGY 3.0

    A tension reasoning engine over 131 S-class problems

    WFGY is an experimental open-source reasoning framework designed to improve the reliability and interpretability of large language model outputs through structured reasoning layers. The project introduces a conceptual reasoning engine that analyzes complex problems by identifying semantic compression errors and residual assumptions within a system’s reasoning process. Its architecture treats reasoning failures as measurable signals that can be detected and analyzed rather than simply observed as incorrect answers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    WebGLM

    WebGLM

    An Efficient Web-enhanced Question Answering System

    ...WebGLM introduces several components that coordinate this process, including a retrieval module that selects relevant web documents, a generator that produces answers, and a scoring system that evaluates the quality of generated responses. The architecture aims to improve the reliability and usefulness of AI systems that answer questions about current or external knowledge sources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    LLMCompiler

    LLMCompiler

    An LLM Compiler for Parallel Function Calling

    LLMCompiler is an open-source framework designed to optimize how large language models orchestrate multiple external tool or function calls during complex reasoning tasks. Traditional LLM agent systems typically execute tool calls sequentially, which can create latency, higher costs, and reduced reliability when solving multi-step problems. LLMCompiler addresses this limitation by applying principles from classical compilers to analyze a task and construct an execution plan that allows multiple functions to run in parallel whenever possible. The framework builds a dependency graph of required operations, identifying which tasks must run sequentially and which can be executed simultaneously. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Magicoder

    Magicoder

    Empowering Code Generation with OSS-Instruct

    ...This technique uses open-source code repositories as a foundation for generating more realistic and diverse instruction datasets for training language models. By grounding training data in real open-source examples, Magicoder aims to reduce bias and improve the reliability of code generation results compared to models trained solely on synthetic instructions. The project includes model implementations, training resources, and evaluation benchmarks that demonstrate how the approach improves instruction-following and code synthesis capabilities. Magicoder models are intended for tasks such as programming assistance, code explanation, automated debugging, and software documentation generation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Hallucination Leaderboard

    Hallucination Leaderboard

    Leaderboard Comparing LLM Performance at Producing Hallucinations

    ...Each model is tested on document summarization tasks to measure how often generated responses introduce information that is not supported by the original source material. The results are published as a leaderboard that allows researchers and developers to compare model reliability and factual consistency. By focusing on hallucination rates rather than traditional metrics such as accuracy or fluency, the benchmark highlights an important aspect of AI system safety and trustworthiness. The leaderboard is regularly updated as new models are released and evaluation methods evolve.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Huatuo-Llama-Med-Chinese

    Huatuo-Llama-Med-Chinese

    Instruction-tuning LLM with Chinese Medical Knowledge

    ...These datasets are constructed from medical knowledge graphs, academic literature, and question-answer pairs designed to teach models how to respond accurately to healthcare-related queries. The goal of the project is to improve the reliability and domain expertise of language models when answering medical questions or assisting with healthcare-related tasks. By combining domain-specific training data with instruction-tuning techniques, the project produces models capable of generating more accurate medical responses than general-purpose models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    ArXiv MCP Server

    ArXiv MCP Server

    A Model Context Protocol server for searching and analyzing arXiv

    ...With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and improving markdown conversion, reflecting active community use in research flows. It’s designed to be drop-in for MCP clients, giving them typed inputs/outputs and predictable errors around a well-known academic corpus. For developers building research copilots, it removes the glue work of wiring arXiv APIs into an agent toolchain.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Instructor

    Instructor

    Structured outputs for llms

    ...Its customizable nature permits the definition of validators and custom error messages, enhancing data validation processes. Instructor is trusted by engineers from platforms like Langflow, underscoring its reliability and effectiveness in managing structured outputs powered by LLMs. Instructor is powered by Pydantic, which is powered by type hints. Schema validation and prompting are controlled by type annotations; less to learn, and less code to write, and it integrates with your IDE.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    OpenClaw Opik Observability Plugin

    OpenClaw Opik Observability Plugin

    Official plugin for OpenClaw that exports agent traces to Opik

    ...The goal of the project is to provide transparency into the internal reasoning and operational pipeline of agent systems so developers can diagnose failures, control costs, and improve reliability.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Farfalle

    Farfalle

    AI search engine - self-host with local or cloud LLMs

    ...Farfalle also includes an agent-based search workflow that plans queries and executes multiple search steps to produce more accurate results than traditional keyword searches. The system supports multiple external search providers and integrates caching and rate-limiting mechanisms to maintain reliability during heavy usage.
    Downloads: 0 This Week
    Last Update:
    See Project
Auth0 Logo