Datadog
Datadog is the monitoring, security and analytics platform for developers, IT operations teams, security engineers and business users in the cloud age. Our SaaS platform integrates and automates infrastructure monitoring, application performance monitoring and log management to provide unified, real-time observability of our customers' entire technology stack. Datadog is used by organizations of all sizes and across a wide range of industries to enable digital transformation and cloud migration, drive collaboration among development, operations, security and business teams, accelerate time to market for applications, reduce time to problem resolution, secure applications and infrastructure, understand user behavior and track key business metrics.
Learn more
Convo
Kanvo provides a drop‑in JavaScript SDK that adds built‑in memory, observability, and resiliency to LangGraph‑based AI agents with zero infrastructure overhead. Without requiring databases or migrations, it lets you plug in a few lines of code to enable persistent memory (storing facts, preferences, and goals), threaded conversations for multi‑user interactions, and real‑time agent observability that logs every message, tool call, and LLM output. Its time‑travel debugging features let you checkpoint, rewind, and restore any agent run state instantly, making workflows reproducible and errors easy to trace. Designed for speed and simplicity, Convo’s lightweight interface and MIT‑licensed SDK deliver production‑ready, debuggable agents out of the box while keeping full control of your data.
Learn more
AgentScope
AgentScope is an AI-driven agent observability and operations platform that provides visibility, control, and performance analytics for autonomous AI agents across production workloads. It enables engineering and DevOps teams to monitor, diagnose, and optimize complex multi-agent applications in real time by capturing detailed telemetry on agent actions, decisions, resource usage, and outcome quality. With rich dashboards and timelines, AgentScope helps teams trace execution flows, identify bottlenecks, and understand how agents interact with external systems, APIs, and data sources, improving debugging and reliability for autonomous workflows. It supports customizable alerting, log aggregation, and structured event views so teams can quickly surface anomalous behavior or errors across distributed agent fleets. In addition to real-time monitoring, AgentScope provides historical analysis and reporting that help teams measure performance trends, model drift, etc.
Learn more
Kayba
Kayba makes AI agents self-improve from experience. It learns from an agent’s execution traces to detect failures, fix them, and measure whether the fix actually worked. Instead of relying on generic evals that cannot explain why an agent failed, Kayba derives failure modes from the agent’s own traces and builds custom benchmarks for the user’s domain, so teams can measure improvement against real production failure patterns. Kayba wires tracing into an agent with one line of setup, watches it around the clock, and flags the moment a step stops being recorded. Even good tracing rots as teams ship changes, and steps can quietly stop being captured; Kayba checks the tracing users already have, shows exactly what is broken, points to the file that needs attention, and sends the gap to a coding agent through MCP. The coding agent patches the issue, and Kayba verifies that the trace is actually closed.
Learn more