Download Latest Version Metrics for AI agents, multi-turn synthetic data generation, and more! source code.tar.gz (15.3 MB)
Email in envelope

Get an email when there's a new version of DeepEval

Home / v3.1.9
Name Modified Size InfoDownloads / Week
Parent folder
New Arena GEval Metric, for Pairwise Comparisons source code.tar.gz 2025-06-25 9.6 MB
New Arena GEval Metric, for Pairwise Comparisons source code.zip 2025-06-25 10.1 MB
README.md 2025-06-25 1.3 kB
Totals: 3 Items   19.7 MB 0

Metric that is alike LLM Arena is Here

In DeepEval's latest release, we are introducing ArenaGEval, the first ever metric to compare test cases to choose the best performing one based on your custom criteria.

It looks something like this:

:::python
from deepeval import evaluate
from deepeval.test_case import ArenaTestCase, LLMTestCaseParams
from deepeval.metrics import ArenaGEval

a_test_case = ArenaTestCase(
    contestants={
        "GPT-4": LLMTestCase(
            input="What is the capital of France?",
            actual_output="Paris",
        ),
        "Claude-4": LLMTestCase(
            input="What is the capital of France?",
            actual_output="Paris is the capital of France.",
        ),
    },
)
arena_geval = ArenaGEval(
    name="Friendly",
    criteria="Choose the winter of the more friendly contestant based on the input and actual output",
    evaluation_params=[
        LLMTestCaseParams.INPUT,
        LLMTestCaseParams.ACTUAL_OUTPUT,
    ],
)


metric.measure(a_test_case)
print(metric.winner, metric.reason)

Docs here: https://deepeval.com/docs/metrics-arena-g-eval

Source: README.md, updated 2025-06-25