Re: [Smtcomp-discussion] [SMT-COMP] SMT-COMP 2017: Call for Comments, Benchmarks, Solvers

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Andrew,

On Sun, 2017-03-26 at 18:03 -0500, Andrew Reynolds wrote:

> On Fri, Mar 24, 2017 at 6:13 PM, Tjark Weber <tja...@it...>
> wrote:
>         On the other hand, one might argue that an unsound solver
>         should not be trusted in any division. [...]
> 
> Here, I would argue that the implementation of SMT solvers tends to be
> fairly modular, meaning that if an SMT solver gets a wrong answer for
> logic A, then the bug is very likely in the component that
> specifically handles A, and the solver is likely still sound for
> everything besides A. [...]

I agree with your point about modularity. But note that the solver's
(likely) soundness for everything besides A is only established in
hindsight, through the results of the competition. The issue, of
course, is that the solver was originally advertised as sound for
everything including A.

Once the competition results are known, it is easy to turn a partially
unsound solver into a useful tool by restricting it to appropriate
logics. But it is hard to know in advance for which logics a solver is
(likely) sound if the developer's claims cannot be relied on.

> Let's consider another case:

> Logic A, B with weights 1,1.
> Solver X gets 1 wrong answer in both A and B, it's total score is -1*1
> + -1*1 = -2
> Solver Y solves all benchmarks in A, but gets 4 wrong answers in B,
> its total score is 1 + -4*1 = -3

> Here, I would argue that Solver Y is the better solver: it is (likely)
> sound for logic A, whereas Solver X is definitely unsound for both A
> and B.

Note that one could also argue the opposite point of view: suppose you
are asked to solve a benchmark that will be chosen at random from
SMT-LIB (including logics A and B) by applying one of these solvers.
(You'll have to take the solver's answer at face value.) Which solver
would you rather use, X or Y?

> I do think it is very important, and I do believe that an unsound
> solver is worse than a solver that solves no benchmarks. My opposition
> is to multiplying by the #wrong benchmarks, since this skews the
> results unfairly. A punishment of -1*(weight of logic) is still very
> severe, but is more appropriate for overall score, in my opinion.

I tend to agree that simply multiplying by #wrong benchmarks is
awkward: we already have a weight for each logic that takes the number
of benchmarks in that logic into account.

I suspect the original motivation was to punish solvers that get many
benchmarks wrong more severely than solvers that get only a few
benchmarks wrong. This is easy to achieve for the division rankings,
but I am not sure whether this property is worth retaining for the
competition-wide ranking. Arguably, any solver that is unsound in some
division shouldn't be used to solve benchmarks from that division.

Best,
Tjark

Re: [Smtcomp-discussion] [SMT-COMP] SMT-COMP 2017: Call for Comments, Benchmarks, Solvers

Tools and documentation of the annual SMT competition

Re: [Smtcomp-discussion] [SMT-COMP] SMT-COMP 2017: Call for Comments, Benchmarks, Solvers