Re: [Smtcomp-discussion] [SMT-COMP] SMT-COMP 2017: Call for Comments, Benchmarks, Solvers
Tools and documentation of the annual SMT competition
Brought to you by:
webertj
From: Tjark W. <tja...@it...> - 2017-03-31 13:08:47
|
Andrew, On Thu, 2017-03-30 at 12:21 -0500, Andrew Reynolds wrote: > One point here is that the competition should encourage innovation. > So > if there is a difficult new logic (e.g. floating points or strings), > and I'd like to enter, but I'm not 100% confident in my solver for > that > logic, it still would be nice for me to be able to enter without > having > to risk the score for the logics where I am sure my solver is correct. Of course, I see your point. If you are only concerned about one or two logics, there is (usually) the option to enter the solver into these logics as non-competing. But in general, I think it makes sense that solver developers need to commit to a set of logics in advance. > The use case of randomly selecting a benchmark in any logic isn't very > interesting, in my opinion. The random choice is merely intended to model some real-world application that generates problems in logics A and B. (This is based on the assumption that SMT-LIB benchmarks in some sense approximate real-world problems, which is certainly debatable.) > Another option would be to multiply by something like -( #wrong / ( 1 > + #wrong ))*(weight of logic), where you'd get a factor -1/2, -2/3, > -3/4, ... as number of wrong benchmarks increase, but this is limited > by -1. In principle, other functions to compute the factor could perhaps also be considered, as long as they are (i) monotonic and (ii) take values between 0 and 1 (or -1, as the case may be). For instance, the function to compute the factor for correct answers is (#correct/#total)^2. Your proposals seem more appropriate though: they give a greater (negative) factor as soon as there is at least one incorrect answer, in line with the view that a solver shouldn't be used for a logic for which it isn't sound. I wonder how the "value" of a correct answer compares to the "cost" of an incorrect answer in a typical application of SMT solvers. It is hard to make this precise, of course, but my hunch is that the latter is typically far greater. Your original motivation to encourage innovation notwithstanding, to me this would suggest another factor (some constant > 1) to place more emphasis on negative scores for incorrect answers. I would also like to caution that the competition-wide ranking is in a sense comparing apples and oranges. As this discussion shows, mapping the performance of solvers that support very different logics to a single number necessarily comes with some challenges. Anyone interested in specific logics should probably look at the competition's results for those logics instead. Best, Tjark |