| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| New Arena GEval Metric, for Pairwise Comparisons source code.tar.gz | 2025-06-25 | 9.6 MB | |
| New Arena GEval Metric, for Pairwise Comparisons source code.zip | 2025-06-25 | 10.1 MB | |
| README.md | 2025-06-25 | 1.3 kB | |
| Totals: 3 Items | 19.7 MB | 0 | |
Metric that is alike LLM Arena is Here
In DeepEval's latest release, we are introducing ArenaGEval, the first ever metric to compare test cases to choose the best performing one based on your custom criteria.
It looks something like this:
:::python
from deepeval import evaluate
from deepeval.test_case import ArenaTestCase, LLMTestCaseParams
from deepeval.metrics import ArenaGEval
a_test_case = ArenaTestCase(
contestants={
"GPT-4": LLMTestCase(
input="What is the capital of France?",
actual_output="Paris",
),
"Claude-4": LLMTestCase(
input="What is the capital of France?",
actual_output="Paris is the capital of France.",
),
},
)
arena_geval = ArenaGEval(
name="Friendly",
criteria="Choose the winter of the more friendly contestant based on the input and actual output",
evaluation_params=[
LLMTestCaseParams.INPUT,
LLMTestCaseParams.ACTUAL_OUTPUT,
],
)
metric.measure(a_test_case)
print(metric.winner, metric.reason)