ASSERT is a requirement-driven evaluation harness for AI agents and LLM applications. It turns natural-language specifications, policies, product requirements, and launch criteria into structured tests that can be reviewed, executed, scored, and improved. The pipeline derives behavior categories, generates single-turn and multi-turn test cases, runs them against a target system, and uses an LLM judge to score conversations against the stated policies. It can evaluate hosted models, custom agents, multi-agent systems, REST clients, and frameworks such as LangGraph, CrewAI, AutoGen, DSPy, LlamaIndex, and OpenAI Agents SDK. ASSERT is designed to close the gap between what a system is supposed to do and what evaluation actually measures. It is useful for responsible AI teams, product teams, and developers who need traceable, spec-aligned testing.

Features

  • Requirement-driven AI evaluation
  • Single-turn and multi-turn test generation
  • LLM-as-judge scoring
  • Agent and model endpoint support
  • Policy-aligned behavior coverage
  • LiteLLM integration for broad model access

Project Samples

Project Activity

See All Activity >

Categories

Agentic AI

License

MIT License

Follow ASSERT

ASSERT Web Site

Other Useful Business Software
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

Native application identity and user-based security for your Azure cloud

Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
Get a free trial
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of ASSERT!

Additional Project Details

Programming Language

Python

Related Categories

Python Agentic AI Tool

Registered

5 days ago