Hello all!To complement what I've written bellow, responding to a similar question, hopping that I not sending something that is somehow placed as comments (by Demian et al.) in http://vufind.org/jira/browse/VUFIND-417), a solution to this may be found here:http://stackoverflow.com/questions/2681393/solr-is-there-a-way-to-include-stopwords-when-searching-exact-phrases
(<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>)
Have a great weekend,
---------- Forwarded message ----------
From: Filipe MS Bento (UA) <firstname.lastname@example.org>
Date: Thu, Feb 21, 2013 at 3:05 PM
Subject: Re: [VuFind-Tech] Searching for terms with apostrophes
To: Demian Katz <email@example.com>, Karla Smith <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>
I guess stopwords and Language Analysis (http://wiki.apache.org/solr/LanguageAnalysis until SOLR v3.6 [we are using 3.5], and becoming obsolete in favor of Analyzers, Tokenizers, and Token Filters, http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) is quiet a sensitive, trade-of matter, a luxury one might say; if have the means to afford it or it is vital to have those terms searchable: empty the stopwords list.
Let me explain, from the little I know of: if you don’t have stopwords and have a huge index performance will suffer performing queries with terms that 99% of the time are no relevant for the search itself. If you use CommonGrams, for instances, feed with 1000common.txt you may find yourself in lots of these situations.
I build up a solr/biblio/conf/stopwords.txt list taking out a lot of terms from the several languages common words lists, the terms that might be more problematic and woul result in excluding relevant resources. Please bear in mind that SOLR is, as they put it an “Enterprise search platform” where most of the exact searches we need in “our” world do not apply most of the times.
Below are my personal notes about it (have to place all of this in a one of my inactive blogs):
A sample of solr/biblio/conf/stopwords.txt
(…) > several languages
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
[note: please do not forget to place AutoGeneratePhraseQueries="false" in the <field> parameters]
<filter class="solr.SnowballPorterFilterFactory" language="Portuguese" />
<filter class="solr.SnowballPorterFilterFactory" language="German" /> <filter class="solr.ElisionFilterFactory"/>
<!-- do word delimiter, etc here -->
<filter class="solr.SnowballPorterFilterFactory" language="French" /> <filter class="solr.SnowballPorterFilterFactory" language="Spanish" />
Version 3.6+: see entries in
<!-- Portuguese -->
<fieldType name="text_pt" class="solr.TextField" positionIncrementGap="100">
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_pt.txt" format="snowball" enablePositionIncrements="true"/>
<!-- less aggressive: <filter class="solr.PortugueseMinimalStemFilterFactory"/> --> <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="Portuguese"/> -->
<!-- most aggressive: <filter class="solr.PortugueseStemFilterFactory"/> -->
Posts from Haiti Trust (VuFind based), example:
Slow Queries and Common Words (Part 2)
All the best and if this message may seem a little bit of topic, but then again may have some interest for the ones dealing with bigger indexes and find themselves with performance issues,
Filipe Manuel S. Bento | http://about.filipebento.pt/
Computer Science Specialist * PhD Researcher (UAveiro/UPorto/CETAC.Media), grant by FCT - Portuguese Foundation for Science and Technology
President/Chair of USE.pt Steering Committee (Portuguese Ex Libris Users’ National Association, hosted by Portuguese Parliament's, Palácio de S. Bento, Lisbon, http://www.USEpt.org, Oct 2010 - )On Sat, Feb 23, 2013 at 12:59 PM, Demian Katz <email@example.com> wrote:
------------------------------------------------------------------------------There is a JIRA ticket which discusses some of these issues:
Disabling stopwords is the easy answer, but there's also a link to this page that suggests some more sophisticated approaches:
If you come up with something that works well for you, I'd love to hear about it -- I haven't had time to tackle this issue in detail, and it would be nice to recommend a best practice on the ticket.
From: Weston, Paige [firstname.lastname@example.org]
Sent: Friday, February 22, 2013 5:04 PM
Subject: [VuFind-General] when all words are stopwords
Hello, all. Our VuFind queries and indexes run through the StopFilterFactory, which strips out non-significant words. The result for a title like No There There (OCLC#ocm81453667) is that it's not retrievable. Can anyone suggest a workaround, short of disabling the filter and reindexing? Is there a way to say, for a particular would-be index entry, "If they're all stopwords then none of them is a stopword"? Thanks.
--E. Paige Weston email: email@example.comLibrary Systems CoordinatorConsortium of Academic & Research Libraries in Illinois (CARLI)100 Trade Centre Drive, Suite 303Champaign, IL 61820-7233voice: 217-244-7593 toll-free: 866-904-5843 fax: 217-244-7596
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
VuFind-General mailing list