Re: [Smtcomp-discussion] [SMT-COMP] SMT-COMP 2017: Call for Comments, Benchmarks, Solvers

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Andrew,

Thank you for your feedback!

On Thu, 2017-03-16 at 10:38 -0500, Andrew Reynolds wrote:

> Regarding disagreements between solvers, last year, as I recall, there
> was a separate discussion period where most (or all) disagreements
> between solvers were resolved e.g. a developer would say "my solver
> was wrong for this one", etc.  Perhaps the competition should:

> - Run all benchmarks/solvers, note the set of unknown status
> benchmarks D that two solvers disagree on.
> - Have a rebuttal period where solver developers can justify their
> answers for benchmarks in D.
> - Based on discussion, set the status on benchmarks in D, and then
> compute the final scores.

I am concerned about some of the practical implications of this
suggestion. Preliminary results would have to be available probably 1-2
weeks earlier than in the past (to allow sufficient time for the steps
above); the competition would depend on status updates by the SMT-LIB
maintainers in a time-critical fashion; and solvers would be run
against one SMT-LIB release but scored against another.

It's not impossible, of course. But given the size of the competition
(with more than 200,000 benchmarks and 1.5 M job pairs in 2016), these
issues have the potential to cause some headache. I would much prefer a
simpler scoring solution that doesn't involve manual review.

> (4) Scoring:

> One thing that has bothered me in previous competitions is how overall
> scoring is computed with respect to wrong benchmarks. 

> Say I have logics A, B, C, D with weights 1, and I solve all
> benchmarks in A,B,C, but I get 3 wrong benchmarks in D. Then my
> overall score is:

> 1 + 1 + 1 + -3*1 = 0

> I would prefer if wrong benchmarks were punished slightly less, so
> that if a solver gets *any* number of wrong benchmarks in a logic,
> then its overall score is negative of the weight of that logic, so
> instead I'd have:

> 1 + 1 + 1 + -1*1 = 2

On the other hand, one might argue that an unsound solver should not be
trusted in any division. There is an even harsher alternative to the
current scoring rule that has been considered before: to disqualify
unsound solvers entirely. In other words, the current scoring rule
already is a compromise.

I would be interested to hear further opinions on this issue. How much
emphasis should the competition place on soundness of solvers?

Best,
Tjark

Re: [Smtcomp-discussion] [SMT-COMP] SMT-COMP 2017: Call for Comments, Benchmarks, Solvers

Tools and documentation of the annual SMT competition

Re: [Smtcomp-discussion] [SMT-COMP] SMT-COMP 2017: Call for Comments, Benchmarks, Solvers