The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
New Arena GEval Metric, for Pairwise Comparisons source code.tar.gz	2025-06-25	9.6 MB	0
New Arena GEval Metric, for Pairwise Comparisons source code.zip	2025-06-25	10.1 MB	0
README.md	2025-06-25	1.3 kB	0
Totals: 3 Items		19.7 MB	0

Metric that is alike LLM Arena is Here

In DeepEval's latest release, we are introducing ArenaGEval, the first ever metric to compare test cases to choose the best performing one based on your custom criteria.

It looks something like this:

:::python
from deepeval import evaluate
from deepeval.test_case import ArenaTestCase, LLMTestCaseParams
from deepeval.metrics import ArenaGEval

a_test_case = ArenaTestCase(
    contestants={
        "GPT-4": LLMTestCase(
            input="What is the capital of France?",
            actual_output="Paris",
        ),
        "Claude-4": LLMTestCase(
            input="What is the capital of France?",
            actual_output="Paris is the capital of France.",
        ),
    },
)
arena_geval = ArenaGEval(
    name="Friendly",
    criteria="Choose the winter of the more friendly contestant based on the input and actual output",
    evaluation_params=[
        LLMTestCaseParams.INPUT,
        LLMTestCaseParams.ACTUAL_OUTPUT,
    ],
)


metric.measure(a_test_case)
print(metric.winner, metric.reason)

Docs here: https://deepeval.com/docs/metrics-arena-g-eval

Source: README.md, updated 2025-06-25

DeepEval Files

Get an email when there's a new version of DeepEval

Metric that is alike LLM Arena is Here