Re: [Smtcomp-discussion] [SMT-COMP] SMT-COMP 2017: Call for Comments, Benchmarks, Solvers
Tools and documentation of the annual SMT competition
Brought to you by:
webertj
From: Tjark W. <tja...@it...> - 2017-03-24 23:13:26
|
Andrew, Thank you for your feedback! On Thu, 2017-03-16 at 10:38 -0500, Andrew Reynolds wrote: > Regarding disagreements between solvers, last year, as I recall, there > was a separate discussion period where most (or all) disagreements > between solvers were resolved e.g. a developer would say "my solver > was wrong for this one", etc. Perhaps the competition should: > - Run all benchmarks/solvers, note the set of unknown status > benchmarks D that two solvers disagree on. > - Have a rebuttal period where solver developers can justify their > answers for benchmarks in D. > - Based on discussion, set the status on benchmarks in D, and then > compute the final scores. I am concerned about some of the practical implications of this suggestion. Preliminary results would have to be available probably 1-2 weeks earlier than in the past (to allow sufficient time for the steps above); the competition would depend on status updates by the SMT-LIB maintainers in a time-critical fashion; and solvers would be run against one SMT-LIB release but scored against another. It's not impossible, of course. But given the size of the competition (with more than 200,000 benchmarks and 1.5 M job pairs in 2016), these issues have the potential to cause some headache. I would much prefer a simpler scoring solution that doesn't involve manual review. > (4) Scoring: > One thing that has bothered me in previous competitions is how overall > scoring is computed with respect to wrong benchmarks. > Say I have logics A, B, C, D with weights 1, and I solve all > benchmarks in A,B,C, but I get 3 wrong benchmarks in D. Then my > overall score is: > 1 + 1 + 1 + -3*1 = 0 > I would prefer if wrong benchmarks were punished slightly less, so > that if a solver gets *any* number of wrong benchmarks in a logic, > then its overall score is negative of the weight of that logic, so > instead I'd have: > 1 + 1 + 1 + -1*1 = 2 On the other hand, one might argue that an unsound solver should not be trusted in any division. There is an even harsher alternative to the current scoring rule that has been considered before: to disqualify unsound solvers entirely. In other words, the current scoring rule already is a compromise. I would be interested to hear further opinions on this issue. How much emphasis should the competition place on soundness of solvers? Best, Tjark |