Re: [Smtcomp-discussion] [SMT-COMP] SMT-COMP 2017: Call for Comments, Benchmarks, Solvers

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Andrew,

On Thu, 2017-03-30 at 12:21 -0500, Andrew Reynolds wrote:

> One point here is that the competition should encourage innovation.
> So
> if there is a difficult new logic (e.g. floating points or strings),
> and I'd like to enter, but I'm not 100% confident in my solver for
> that
> logic, it still would be nice for me to be able to enter without
> having
> to risk the score for the logics where I am sure my solver is correct.

Of course, I see your point. If you are only concerned about one or two
logics, there is (usually) the option to enter the solver into these
logics as non-competing. But in general, I think it makes sense that
solver developers need to commit to a set of logics in advance.

> The use case of randomly selecting a benchmark in any logic isn't very
> interesting, in my opinion.

The random choice is merely intended to model some real-world
application that generates problems in logics A and B. (This is based
on the assumption that SMT-LIB benchmarks in some sense approximate
real-world problems, which is certainly debatable.)

> Another option would be to multiply by something like -( #wrong / ( 1
> + #wrong ))*(weight of logic), where you'd get a factor -1/2, -2/3,
> -3/4, ... as number of wrong benchmarks increase, but this is limited
> by -1.

In principle, other functions to compute the factor could perhaps also
be considered, as long as they are (i) monotonic and (ii) take values
between 0 and 1 (or -1, as the case may be).

For instance, the function to compute the factor for correct answers is
(#correct/#total)^2. Your proposals seem more appropriate though: they
give a greater (negative) factor as soon as there is at least one
incorrect answer, in line with the view that a solver shouldn't be used
for a logic for which it isn't sound.

I wonder how the "value" of a correct answer compares to the "cost" of
an incorrect answer in a typical application of SMT solvers. It is hard
to make this precise, of course, but my hunch is that the latter is
typically far greater.

Your original motivation to encourage innovation notwithstanding, to me
this would suggest another factor (some constant > 1) to place more
emphasis on negative scores for incorrect answers.

I would also like to caution that the competition-wide ranking is in a
sense comparing apples and oranges. As this discussion shows, mapping
the performance of solvers that support very different logics to a
single number necessarily comes with some challenges. Anyone interested
in specific logics should probably look at the competition's results
for those logics instead.

Best,
Tjark

Re: [Smtcomp-discussion] [SMT-COMP] SMT-COMP 2017: Call for Comments, Benchmarks, Solvers

Tools and documentation of the annual SMT competition

Re: [Smtcomp-discussion] [SMT-COMP] SMT-COMP 2017: Call for Comments, Benchmarks, Solvers