The ticket's main issue is resolved, but the issue in that comment is NOT resolved.
On Oct 30, 2013, at 10:39 AM, Demian Katz wrote:
Thanks, Naomi! It looks like SOLR-2058 is resolved as of 4.0, so hopefully that particular one won’t bite us (unless I’m misreading something). In any case, it seems like on the whole this is going to be a good thing… it’s just a shame there are still problems around the edges! From: Naomi Dushay [mailto:email@example.com]
Sent: Wednesday, October 30, 2013 1:37 PM
To: Demian Katz
Subject: Re: [VuFind-Tech] eDismax, continued...
FYI: when we switched to edismax from dismax for SearchWorks, it turned out the relevancy rankings weren't the same either. And there are some other bugs with edismax:
for us, it made our relevancy tests for "Nature" and "Science" fail.
On Oct 29, 2013, at 12:45 PM, Demian Katz wrote:
The main problem with Extended Dismax is that it doesn’t properly apply the default operator when a NOT or - clause is used. apples oranges NOT bananas you would expect your search results to be the same as those for: apples AND oranges NOT bananas apples OR oranges NOT bananas On today’s dev call, we discussed the possibility of detecting the - or NOT operators, then failing back to the old Lucene code to get around this limitation. Alas, the plot thickens and it gets more complicated. First of all, using the fallback code was a mistake. It currently does not handle NOT properly. Not surprising, because it’s not real DisMax. It creates a whole bunch of queries and OR’s them together – so you will frequently get results back that include the term you are attempting to exclude. There is no easy fix for this, aside from writing our own DisMax query generator in PHP, which would be an exercise in madness. Another interesting discovery is that the basic DisMax handler does process the - operator appropriately… so while a current instance of VuFind will break with “apples oranges NOT bananas” it will yield correct results for “apples oranges -bananas". So this is definitely a regression if we move to eDisMax. Maybe not a significant one, since library users are much more likely to use the broken NOT syntax than the working - syntax. This all leaves me even more uncertain about the best road forward – switching to eDismax breaks something that is already broken, just in a different way. If the Solr team fixes the underlying problem that is causing this behavior, then we’ll be in great shape. In the meantime, it seems we have these options: 1.) Stick with the status quo, but add the option to turn on eDismax if desired 2.) Switch to eDismax, on the assumption that the benefits outweigh the drawbacks 3.) Write some sort of crude query parser to insert AND operators into queries containing NOT or -. We can probably make the most common cases work fairly easily, but doing it correctly would require a lot of effort, and that may be a waste of time given that this is a workaround for a bug and not something that we need in an ideal world. 4.) Write code to use the regular Dismax handler instead of eDismax for queries containing the - operator and no other operators. This will lead to optimal functionality of a small number of edge cases – not worthwhile in my opinion, but maybe worth mentioning. I’d really like to get this wrapped up, but the best option is not obvious.