From: Uwe R. <re...@he...> - 2017-06-20 23:09:29
|
Solr has several options to provide spell checking. But it's hard to understand the complexity. > https://cwiki.apache.org/confluence/display/solr/Spell+Checking Most of the variants are based on the https://en.wikipedia.org/wiki/Levenshtein_distance. Since the distance between "monKey" and "money" is just '1', you will hardly get better suggestions without investing a lot of work in this scope. In our installations, we are using a customized index with the DirectSolrSpellChecker. The main differences to original VuFind are: * checking sequences of terms instead of single words * no need to rebuild helper files on updates. * a special index field to compare with. (filled only with names and titles) Example: > https://hds.hebis.de/ubffm/Search/Results?lookfor=monkey's+island Is our solution better? Well, I hope so, but I'm not sure. We recognized two suboptimal effects. 1. Spell checking on sentences is expensive. It's necessary to solve a Cartesian product over all variants of the given search terms. We have to deactivate spell checking if a patron is looking for more than three terms. 2. We have configured the rules for searching quite fuzzy. Therefore is there often no real difference in the results between the original search and the suggested variants. Uwe ########################## # Excerpts from our solrconfig.xml # > <requestHandler name="edismax" class="solr.SearchHandler"> > <lst name="defaults"> > <str name="defType">edismax</str> > <str name="tie">0.1</str> > <str name="qf">allfields_unstemmed</str> > <str name="spellcheck">true</str> > <str name="spellcheck.collate">true</str> > <str name="spellcheck.extendedResults">true</str> > <str name="spellcheck.collateExtendedResults">true</str> > <str name="spellcheck.maxResultsForSuggest">1000</str> > <str name="spellcheck.maxCollations">2</str> > <str name="spellcheck.maxCollationTries">1000</str> > <str name="spellcheck.alternativeTermCount">5</str> > </lst> > <arr name="components"> > <str>query</str> > <str>facet</str> > <str>mlt</str> > <str>stats</str> > <str>debug</str> > <str>elevator</str> > <str>spellcheck</str> > </arr> > </requestHandler> > > <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> > <str name="queryAnalyzerFieldType">xtext</str> > <lst name="spellchecker"> > <str name="name">default</str> > <str name="classname">solr.DirectSolrSpellChecker</str> > <str name="field">spelling</str> > <int name="maxEdits">2</int> > <int name="minPrefix">1</int> > <int name="maxInspections">5</int> > <int name="minQueryLength">1</int> > </lst> > <float name="maxQueryFrequency">0.01</float> > </searchComponent> ## # EOF ## > *From:* Shepard, Thomas - 0050 - MITLL [mailto:tsh...@ll...] > *Sent:* Tuesday, June 20, 2017 2:02 PM > *To:* vuf...@li... > *Subject:* [Vufind-admins] Configuring search term alternatives in vufind > > How does one modify what values are chosen as vufind’s Search alternatives? > > For example, what/where is the formula that results in suggesting > “money” when we search for “monkey”? > > Does this fall under solr’s domain or is there a file in vufind that can > be modified? > > Thanks, > > Thom |