Showing 7 open source projects for "benchmark"

View related business solutions
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    AutoAgent AI

    AutoAgent AI

    Autonomous harness engineering

    ...Instead of manually tuning prompts or workflows, developers define high-level goals in a configuration file, and the system continuously modifies its own tools, orchestration, and logic based on benchmark performance. It operates through a loop of testing, analyzing failures, and refining the agent’s configuration to maximize a scoring metric. The framework uses a single-file agent harness combined with structured tasks and evaluation suites to guide optimization. It runs inside Docker for safe execution and reproducibility. This approach shifts agent development from manual design to automated optimization. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    CUDA Agent

    CUDA Agent

    Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

    ...The system operates in a ReAct-style loop where the agent profiles baseline implementations, writes CUDA code, compiles it in a sandbox, and iteratively refines performance. CUDA-Agent has demonstrated strong benchmark results, achieving high pass rates and significant speedups compared with compiler baselines such as torch.compile.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Diplomacy Cicero

    Diplomacy Cicero

    Code for Cicero, an AI agent that plays the game of Diplomacy

    ...The codebase is implemented primarily in Python with performance-critical components in C++ (via pybind11 bindings) and is configured to run in a high‐GPU cluster environment. Configuration is managed via protobuf files to define tasks such as self-play, benchmark agent comparisons, and RL training. The project is now archived and read-only, reflecting that it is no longer actively developed but remains publicly available for research use.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Agentex

    Agentex

    Open source codebase for Scale Agentex

    AgentEX is an open framework from Scale for building, running, and evaluating agentic workflows, with an emphasis on reproducibility and measurable outcomes rather than ad-hoc demos. It treats an “agent” as a composition of a policy (the LLM), tools, memory, and an execution runtime so you can test the whole loop, not just prompting. The repo focuses on structured experiments: standardized tasks, canonical tool interfaces, and logs that make it possible to compare models, prompts, and tool...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 5
    Habitat-Lab

    Habitat-Lab

    A modular high-level library to train embodied AI agents

    ...Providing algorithms for single and multi-agent training (via imitation or reinforcement learning, or no learning at all as in SensePlanAct pipelines), as well as tools to benchmark their performance on the defined tasks using standard metrics.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Agent S

    Agent S

    Agent S: an open agentic framework that uses computers like a human

    ...Built to operate graphical user interfaces like a human, it allows AI agents to perceive screens, reason about tasks, and execute actions across macOS, Windows, and Linux systems. The latest version, Agent S3, surpasses human-level performance on the OSWorld benchmark, demonstrating state-of-the-art results in complex multi-step computer tasks. Agent S combines powerful foundation models (such as GPT-5) with grounding models like UI-TARS to translate visual inputs into precise executable actions. It supports flexible deployment via CLI, SDK, or cloud, and integrates with multiple model providers including OpenAI, Anthropic, Gemini, Azure, and Hugging Face endpoints. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    OWL

    OWL

    Optimized Workforce Learning for General Multi-Agent Assistance

    Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation. OWL (Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation) is an advanced framework designed to enhance multi-agent collaboration, improving task automation across various domains. By utilizing dynamic agent interactions, OWL aims to streamline and optimize complex workflows, making AI collaboration more natural, efficient, and adaptable. It is built on...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo