The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
assert_ai-0.1.0-py3-none-any.whl	2026-06-02	310.3 kB	0
assert_ai-0.1.0.tar.gz	2026-06-02	421.7 kB	0
ASSERT v0.1.0 Initial release source code.tar.gz	2026-06-02	8.5 MB	0
ASSERT v0.1.0 Initial release source code.zip	2026-06-02	8.9 MB	2
README.md	2026-06-02	3.3 kB	0
Totals: 5 Items		18.1 MB	2

ASSERT v0.1.0 Initial release

ASSERT stands for Adaptive Spec-driven Scoring for Evaluation and Regression Testing.

Local-first. Framework-agnostic. Trace-aware.

ASSERT turns your specified behaviors in natural language into structured, executable evaluations that can be reviewed, run, scored, and improved over time. From the natural language specification, the ASSERT pipeline derives behavior categories, generates single-turn and multi-turn test cases, inferences them against your target, and uses an LLM judge to score each conversation against your policies.

Install

:::bash
pip install assert-ai
# optional extras
pip install "assert-ai[otel]"        # OpenTelemetry + Phoenix trace capture
pip install "assert-ai[langgraph]"   # LangGraph target adapter

Supports Python 3.11, 3.12, and 3.13.

Quickstart

:::bash
assert-ai run --config examples/travel_planner_langgraph/eval_config.yaml

Follow the full walkthrough: https://github.com/responsibleai/ASSERT/blob/main/docs/getting-started.md

What you get with ASSERT

Spec-driven coverage - test cases are generated from your product requirements and context, not a generic benchmark. You specify the behaviors that you want to test for
Test any model endpoint via integrations with LiteLLM, supporting 100+ model endpoints from platform providers such as Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, VLLM, NVIDIA NIM.
Test any agent or multi-agent system via integrations with OpenInference. Evaluate a LangGraph agent, a CrewAI / OpenAI Agents SDK / DSPy / LlamaIndex / AutoGen system, custom multi-agent orchestration, a Python callable, or a hosted model — without rewriting the evaluation orchestration pipeline.
Agent trace-grounded judgment - the recommended integration captures OpenTelemetry spans (Phoenix/OpenInference auto-instruments 33+ frameworks in two lines, or you can emit your own with the OTel SDK) so the judge can cite tool calls, routing, model calls, and latency as evidence — not just the final response.
Portable artifacts - every stage writes JSON/JSONL files locally for inspection, CI, and sharing.
Bundled local viewer - browse runs side-by-side, pin a baseline, drill into per-behavior dimension breakdowns, and read judge justifications cited against the captured traces.

Risks and limitations

See the Concept Doc for the full risks and limitations section.

Telemetry

This project does not collect or send telemetry to Microsoft by default. Runs write local artifacts under artifacts/results/, and optional OpenTelemetry trace capture is controlled by your configuration and local collector setup, such as Phoenix.