From: Eoghan Ó C. <eog...@gm...> - 2013-06-26 21:25:34
|
I dug into this a few months back & I think we could definitely simplify things. When Greg wrote the original spell check code he had to work around several Solr limitations, e.g.: -- Solr suggesting "behavior" for "behaviour" because the US spelling occurs more in the index -- Solr suggesting "barry potter" for "harry poter" - both terms corrected where only one is desirable, again because of frequency -- Solr's spelling collation feature not working yet, so no way to ensure that clicking on a suggestion actually leads to results Spelling collation now works, so Solr can be configured to do X number or internal checks to ensure suggested phrases/collations actually return at least one document before returning a suggestion or giving up. Also the "context sensitive spelling suggestions and collations" features allow configuration of "alternativeTermCount" instead of "onlyMorePopular" which stops correctly spelled but infrequently occurring words being replaced (e.g. behavior or harry in the examples above, see SOLR-2585 and LUCENE-3436 for discussion). Along with new Solr 4.x features like WordBreakSolrSpellChecker, I think we could improve on the current suggestions and probably eliminate the need for the shingles spell index. I did quite a bit of testing at the Solr level with these new configurations & the results were very positive. I didn't get as far as digging into refactoring Vufind though. I'd like to do so at some point if time allows or happy to help anyone else who wants to give it a go. Cheers, Eoghan On 26 June 2013 18:41, Demian Katz <dem...@vi...> wrote: > The search alternatives come from Solr’s spellcheck index, which makes > suggestions based on term frequencies in your index. The basic > configuration for this can be found in solr/biblio/conf/solrconfig.xml, > with some additional settings in config.ini controlling exactly how the > native Solr functionality is used.**** > > ** ** > > It’s amusing that your search term was “Living with Shingles,” because > you’ll find that Solr uses analysis of a different sort of shingles > (two-word phrases) to come up with its phrase suggestions.**** > > ** ** > > All of the spelling configuration was devised several years ago, and Solr > has more flexible capabilities now, so it’s entirely possible that we can > do better at this point in time – just a matter of somebody finding time to > dig into it a bit deeper!**** > > ** ** > > - Demian**** > > ** ** > > *From:* Karla Smith [mailto:sm...@wi...] > *Sent:* Wednesday, June 26, 2013 12:52 PM > *To:* 'vuf...@li...' > *Subject:* [VuFind-General] Curious about alternate search terms**** > > ** ** > > Hi,**** > > Not really a problem, just curious...Where do the "Search Alternatives:" > terms come from? **** > > ** ** > > Some of the alternatives are rather humorous...a search for "Living with > Shingles" came up with:**** > > “With shingles >> with single, with triangles, with sprinkles”**** > > ** ** > > So, why “sprinkles” and “triangles” but not “shingle” or “spindles”?**** > > ** ** > > Thanks,**** > > --Karla**** > > Karla Smith, ILS Manager**** > > Winnefox Library System**** > > Oshkosh, WI 54904**** > > ~If Truth is Beauty, how come no one has their hair done in the library? – > Lily Tomlin**** > > ** ** > > ** ** > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > VuFind-General mailing list > VuF...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-general > > |