This all sounds great. I think you’ll find that the spelling logic is fairly well isolated in David’s new search system, so making changes should be relatively
painless once you understand how the pieces fit together. I’ll be happy to help point you in the right direction whenever you’re ready to start experimenting (and I’m sure David is willing to help too).
From: Eoghan Ó Carragáin [mailto:firstname.lastname@example.org]
Sent: Wednesday, June 26, 2013 5:25 PM
To: Demian Katz
Cc: Karla Smith; email@example.com
Subject: Re: [VuFind-General] Curious about alternate search terms
I dug into this a few months back & I think we could definitely simplify things. When Greg wrote the original spell check code he had to work around several Solr limitations, e.g.:
-- Solr suggesting "behavior" for "behaviour" because the US spelling occurs more in the index
-- Solr suggesting "barry potter" for "harry poter" - both terms corrected where only one is desirable, again because of frequency
-- Solr's spelling collation feature not working yet, so no way to ensure that clicking on a suggestion actually leads to results
Spelling collation now works, so Solr can be configured to do X number or internal checks to ensure suggested phrases/collations actually return at least one document before returning a suggestion or giving up. Also the "context sensitive
spelling suggestions and collations" features allow configuration of "alternativeTermCount" instead of "onlyMorePopular" which stops correctly spelled but infrequently occurring words being replaced (e.g. behavior or harry in the examples above, see SOLR-2585
and LUCENE-3436 for discussion). Along with new Solr 4.x features like WordBreakSolrSpellChecker, I think we could improve on the current suggestions and probably eliminate the need for the shingles spell index.
I did quite a bit of testing at the Solr level with these new configurations & the results were very positive. I didn't get as far as digging into refactoring Vufind though. I'd like to do so at some point if time allows or happy to help
anyone else who wants to give it a go.
On 26 June 2013 18:41, Demian Katz <firstname.lastname@example.org> wrote:
The search alternatives come from Solr’s spellcheck index, which makes suggestions based on term frequencies in your index. The basic configuration
for this can be found in solr/biblio/conf/solrconfig.xml, with some additional settings in config.ini controlling exactly how the native Solr functionality is used.
It’s amusing that your search term was “Living with Shingles,” because you’ll find that Solr uses analysis of a different sort of shingles (two-word
phrases) to come up with its phrase suggestions.
All of the spelling configuration was devised several years ago, and Solr has more flexible capabilities now, so it’s entirely possible that we can do
better at this point in time – just a matter of somebody finding time to dig into it a bit deeper!
Not really a problem, just curious...Where do the "Search Alternatives:" terms come from?
Some of the alternatives are rather humorous...a search for "Living with Shingles" came up with:
“With shingles >> with single, with triangles, with sprinkles”
So, why “sprinkles” and “triangles” but not “shingle” or “spindles”?
Karla Smith, ILS Manager
Winnefox Library System
Oshkosh, WI 54904
~If Truth is Beauty, how come no one has their hair done in the library? – Lily Tomlin
This SF.net email is sponsored by Windows:
Build for Windows Store.
VuFind-General mailing list