From: Tuan N. <tu...@yo...> - 2010-09-29 14:23:14
|
You're right, we may have taken it too far. allfields_unstemmed would have sufficed for dealing with this type of wildcard issues. On Sep 29, 2010, at 9:47 AM, Demian Katz wrote: > I think it’s a question of degrees – I think creating an unstemmed > equivalent to EVERY searchable field may be taking it too far; in > some cases, one unstemmed field actually covers several others (i.e. > title_full_unstemmed has the same text as title, title_short, > title_full). We could create unstemmed versions of all the > variations in order to get extremely granular relevance ranking, but > I think that’s probably overkill. > > One simple change that would offer somewhat more comprehensive > coverage without greatly expanding the schema would be to add an > allfields_unstemmed field. That would probably have a fairly > significant effect on index size… but maybe not. > > Then it comes to a question of which cases in between matter? Do we > care about tables of contents? How about geographic/genre/era? How > about series titles? These are relatively little-used areas where > adding unstemmed versions would probably have little impact on index > size… but is it worth increasing the size and complexity of the > schema and search configuration? I’m not sure. > > I’m definitely not opposed to expanding the use of unstemmed fields > in the trunk – the unstemmed title and topic fields just went into > the trunk recently, and it may well be appropriate to add a few > more. I’m just not sure how far to take it before it becomes a > burden rather than a help. Comments are welcome! If you would like > to share a patch for discussion, that might be helpful as well. > > - Demian > > From: Tuan Nguyen [mailto:tu...@yo...] > Sent: Wednesday, September 29, 2010 9:31 AM > To: Demian Katz > Cc: Osullivan L.; vuf...@li... > Subject: Re: [VuFind-Tech] ? wildcard > > We took this approach from day one, every searchable field has an > equivalent unstemmed version. We also use the unstemmed version to > give higher boost to exact/unstemmed matches. Could we expand this > and make it part of the standard schema that every searchable field > has unstemmed equivalent? I understand the concern about growing the > size of the index, but from our experience the increase in index > size is not significant. > > > On Sep 29, 2010, at 9:00 AM, Demian Katz wrote: > > > Take a look at r3023 – I made a few adjustments to the > searchspecs.yaml file so that unstemmed fields are used more > effectively when advanced queries are generated. The situation > still isn’t perfect, as there are still stemmed fields without > unstemmed equivalents… but this offers proper coverage of title and > subject, so it’s a vast improvement! > > - Demian > > From: Tuan Nguyen [mailto:tu...@yo...] > Sent: Wednesday, September 29, 2010 8:52 AM > To: Osullivan L. > Cc: vuf...@li... > Subject: Re: [VuFind-Tech] ? wildcard > > Hi Luke, > > The ? wildcard works as advertised. The problem is with the > stemming. You can see how this works in the analysis tab of the solr > admin interface. > > > globalization gets stemmed to global > globalisation gets stemmed to globalis > > <image001.png><image002.png> > > On Sep 29, 2010, at 8:15 AM, Osullivan L. wrote: > > > > globalisation > > |