This all sounds great.  I think you’ll find that the spelling logic is fairly well isolated in David’s new search system, so making changes should be relatively painless once you understand how the pieces fit together.  I’ll be happy to help point you in the right direction whenever you’re ready to start experimenting (and I’m sure David is willing to help too).

 

- Demian

 

From: Eoghan Ó Carragáin [mailto:eoghan.ocarragain@gmail.com]
Sent: Wednesday, June 26, 2013 5:25 PM
To: Demian Katz
Cc: Karla Smith; vufind-general@lists.sourceforge.net
Subject: Re: [VuFind-General] Curious about alternate search terms

 

I dug into this a few months back & I think we could definitely simplify things. When Greg wrote the original spell check code he had to work around several Solr limitations, e.g.:

-- Solr suggesting "behavior" for "behaviour" because the US spelling occurs more in the index

-- Solr suggesting "barry potter" for "harry poter" - both terms corrected where only one is desirable, again because of frequency

-- Solr's spelling collation feature not working yet, so no way to ensure that clicking on a suggestion actually leads to results

 

Spelling collation now works, so Solr can be configured to do X number or internal checks to ensure suggested phrases/collations actually return at least one document before returning a suggestion or giving up. Also the "context sensitive spelling suggestions and collations" features allow configuration of "alternativeTermCount" instead of "onlyMorePopular" which stops correctly spelled but infrequently occurring words being replaced (e.g. behavior or harry in the examples above, see SOLR-2585 and LUCENE-3436 for discussion). Along with new Solr 4.x features like WordBreakSolrSpellChecker, I think we could improve on the current suggestions and probably eliminate the need for the shingles spell index. 

 

I did quite a bit of testing at the Solr level with these new configurations & the results were very positive. I didn't get as far as digging into refactoring Vufind though. I'd like to do so at some point if time allows or happy to help anyone else who wants to give it a go.

 

Cheers,

Eoghan

 

 

On 26 June 2013 18:41, Demian Katz <demian.katz@villanova.edu> wrote:

The search alternatives come from Solr’s spellcheck index, which makes suggestions based on term frequencies in your index.  The basic configuration for this can be found in solr/biblio/conf/solrconfig.xml, with some additional settings in config.ini controlling exactly how the native Solr functionality is used.

 

It’s amusing that your search term was “Living with Shingles,” because you’ll find that Solr uses analysis of a different sort of shingles (two-word phrases) to come up with its phrase suggestions.

 

All of the spelling configuration was devised several years ago, and Solr has more flexible capabilities now, so it’s entirely possible that we can do better at this point in time – just a matter of somebody finding time to dig into it a bit deeper!

 

- Demian

 

From: Karla Smith [mailto:smith@winnefox.org]
Sent: Wednesday, June 26, 2013 12:52 PM
To: 'vufind-general@lists.sourceforge.net'
Subject: [VuFind-General] Curious about alternate search terms

 

Hi,

Not really a problem, just curious...Where do the "Search Alternatives:" terms come from? 

 

Some of the alternatives are rather humorous...a search for "Living with Shingles" came up with:

“With shingles >> with single, with triangles, with sprinkles”

 

So, why “sprinkles” and “triangles” but not “shingle” or “spindles”?

 

Thanks,

--Karla

Karla Smith, ILS Manager

Winnefox Library System

Oshkosh, WI 54904

~If Truth is Beauty, how come no one has their hair done in the library? – Lily Tomlin

 

 


------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
VuFind-General mailing list
VuFind-General@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/vufind-general