Respan
Respan is a self-driving observability and evaluation platform built specifically for AI agents. It enables teams to trace full execution flows, including messages, tool calls, routing decisions, memory usage, and outcomes. The platform connects observability, evaluations, and optimization into a continuous improvement loop. Metric-first evaluations allow teams to define performance standards such as accuracy, cost, reliability, and safety. Respan also includes capability and regression testing to protect stable behaviors while improving new ones. An AI-powered evaluation agent analyzes failures, identifies root causes, and recommends next steps automatically. With compliance certifications including ISO 27001, SOC 2, GDPR, and HIPAA, Respan supports secure, large-scale AI deployments across industries.
Learn more
Agenta
Agenta is an open-source LLMOps platform designed to help teams build reliable AI applications with integrated prompt management, evaluation workflows, and system observability. It centralizes all prompts, experiments, traces, and evaluations into one structured hub, eliminating scattered workflows across Slack, spreadsheets, and emails. With Agenta, teams can iterate on prompts collaboratively, compare models side-by-side, and maintain full version history for every change. Its evaluation tools replace guesswork with automated testing, LLM-as-a-judge, human annotation, and intermediate-step analysis. Observability features allow developers to trace failures, annotate logs, convert traces into tests, and monitor performance regressions in real time. Agenta helps AI teams transition from siloed experimentation to a unified, efficient LLMOps workflow for shipping more reliable agents and AI products.
Learn more
Vivgrid
Vivgrid is a development platform for AI agents that emphasizes observability, debugging, safety, and global deployment infrastructure. It gives you full visibility into agent behavior, logging prompts, memory fetches, tool usage, and reasoning chains, letting developers trace where things break or deviate. You can test, evaluate, and enforce safety policies (like refusal rules or filters), and incorporate human-in-the-loop checks before going live. Vivgrid supports the orchestration of multi-agent systems with stateful memory, routing tasks dynamically across agent workflows. On the deployment side, it operates a globally distributed inference network to ensure low-latency (sub-50 ms) execution and exposes metrics like latency, cost, and usage in real time. It aims to simplify shipping resilient AI systems by combining debugging, evaluation, safety, and deployment into one stack, so you're not stitching together observability, infrastructure, and orchestration.
Learn more
Maxim
Maxim is an agent simulation, evaluation, and observability platform that empowers modern AI teams to deploy agents with quality, reliability, and speed.
Maxim's end-to-end evaluation and data management stack covers every stage of the AI lifecycle, from prompt engineering to pre & post release testing and observability, data-set creation & management, and fine-tuning.
Use Maxim to simulate and test your multi-turn workflows on a wide variety of scenarios and across different user personas before taking your application to production.
Features:
Agent Simulation
Agent Evaluation
Prompt Playground
Logging/Tracing Workflows
Custom Evaluators- AI, Programmatic and Statistical
Dataset Curation
Human-in-the-loop
Use Case:
Simulate and test AI agents
Evals for agentic workflows: pre and post-release
Tracing and debugging multi-agent workflows
Real-time alerts on performance and quality
Creating robust datasets for evals and fine-tuning
Human-in-the-loop workflows
Learn more