From: Joe W. <jo...@gm...> - 2022-03-18 12:57:45
|
Hi Alem, The words "to" and "the" are included in Lucene's default list of stopwords. They're dropped when indexing documents and ignored when querying them. To ensure these words are indexed, you have to either customize the list of stopwords or explicitly eliminate all stopwords, adding the following <param> element to the <analyzer> element in your .xconf file: <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"> > <!-- Specify stop words - or remove them entirely --> > <param name="stopwords" type= > "org.apache.lucene.analysis.util.CharArraySet"> > <!--<value>the</value>--> > </param> > </analyzer> For the list of words treated as stopwords by default in Lucene, see https://markmail.org/message/wxmtjzbskgrj2cug. Place any stopwords you want to keep in <value> elements. By omitting any <value> elements (as shown above), the you've removed all stopwords. For the current eXist docs on configuring Lucene's stopword facility, see https://exist-db.org/exist/apps/doc/lucene#conf. Joe On Thu, Mar 17, 2022 at 5:59 PM Areki, Alem <aa...@ri...> wrote: > Hi all, > > > > I am having a problem Lucene searching with stop words in between the > phrase keywords > > - The title of the article to be searched: “*Where to watch: Catch the > Spiders in March Madness*” > - A searching phrase like “*March Madness*” works but phrases like “*Where > to Watch*” or “*Catch the Spiders*” don’t work. > - Searching the phrases doesn’t work has stopwords in between them. > > > > Does anyone have any clue why I am having this issue? > > > > Thanks > > Alem > > -- > > Alem T. Areki > > Senior Web Developer – Web Services > > University of Richmond > > > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open > |