• Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • Stop vibe-debugging. Icon
    Stop vibe-debugging.

    Plug Claude into your app's actual errors.

    AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.
    Free 30 days.
  • 1
    HumanEval

    HumanEval

    Code for the paper "Evaluating Large Language Models Trained on Code"

    human-eval is a benchmark dataset and evaluation framework created by OpenAI for measuring the ability of language models to generate correct code. It consists of hand-written programming problems with unit tests, designed to assess functional correctness rather than superficial metrics like text similarity. Each task includes a natural language prompt and a function signature, requiring the model to generate an implementation that passes all provided tests. The benchmark has become a standard for evaluating code generation models, including those in the Codex and GPT families. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Agentless

    Agentless

    An agentless approach to automatically solve software development

    ...It then generates multiple candidate patches for the identified locations using language model reasoning and diff-style edits. In the final stage, the framework validates potential patches by running regression tests and additional reproduction tests to confirm whether the fix resolves the original error. Based on these results, the system ranks the candidate patches and selects the most reliable solution to submit.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    GLM-4.5

    GLM-4.5

    GLM-4.5: Open-source LLM for intelligent agents by Z.ai

    GLM-4.5 is a cutting-edge open-source large language model designed by Z.ai for intelligent agent applications. The flagship GLM-4.5 model has 355 billion total parameters with 32 billion active parameters, while the compact GLM-4.5-Air version offers 106 billion total parameters and 12 billion active parameters. Both models unify reasoning, coding, and intelligent agent capabilities, providing two modes: a thinking mode for complex reasoning and tool usage, and a non-thinking mode for...
    Downloads: 40 This Week
    Last Update:
    See Project
  • 4
    LLM Colosseum

    LLM Colosseum

    Benchmark LLMs by fighting in Street Fighter 3

    ...The system places language models inside the environment of the classic video game Street Fighter III, where they must interpret the game state and decide which actions to perform during combat. This setup creates a dynamic environment that tests reasoning, situational awareness, and decision-making abilities in real time. Instead of relying purely on reward signals as in reinforcement learning agents, the models analyze contextual information and generate strategic actions based on the game environment. Performance is evaluated using a competitive ranking system that assigns models an ELO rating based on their results across matches against other models.
    Downloads: 6 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    promptmap2

    promptmap2

    A security scanner for custom LLM applications

    promptmap is an automated security scanner for custom LLM applications that focuses on prompt injection and related attack classes. The project supports both white-box and black-box testing, which means it can either run tests directly against a known model and system prompt configuration or attack an external HTTP endpoint without internal access. Its scanning workflow uses a dual-LLM architecture in which one model acts as the target being tested and another acts as a controller that evaluates whether an attack succeeded. The repository emphasizes broad coverage, including test rules for prompt stealing, jailbreaks, harmful content generation, hate-related outputs, social bias, and distraction attacks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    OSS-Fuzz Gen

    OSS-Fuzz Gen

    LLM powered fuzzing via OSS-Fuzz

    OSS-Fuzz-Gen is a companion project that helps automatically create or improve fuzz targets for open-source codebases, aiming to increase coverage in OSS-Fuzz with minimal maintainer effort. It analyses a library’s APIs, examples, and tests to propose harnesses that exercise parsers, decoders, or protocol handlers—precisely the code where fuzzing pays off. The system integrates with modern LLM-assisted workflows to draft harness code and then iterates based on build errors or low coverage signals. Importantly, it aligns with OSS-Fuzz conventions, generating corpus seeds, build rules, and sanitizer settings so projects can plug in quickly. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo