|
From: Dave H. <dh...@rc...> - 2002-08-15 19:37:37
|
On Mon, 12 Aug 2002, Gilles Detillieux wrote: Hello Giles, I downloaded and installed htdig-3.2.0b4-20020811 and still had a problem with bad_words when they are inside the phrase. For Example : I have a document that contains the following phrase : "ABOUT THIS SEARCH ENGINE - for a basic search do not use" from the above my bad_words file contains "this, for, not" If the whole phrase is searched as listed above I get No matches. If "THIS SEARCH ENGINE - for" is searched I get hits for documents that contain "search engine" So it appears to strip the leading and trailing bad words. If "THIS SEARCH ENGINE - for a" is searched I get no hits. Since I index single characters and I do not have "a" in the bad words list this should find something. The presence of "a" makes "for" an internal bad word. Likewise if "for a basic search do not" is searched it strips the initial and trailing bad words and finds the documnet. But if "for a basic search do not use" is searched the fact that "not" is an internal stopword causes no hits to be found. Let me know if there is anything that can be done. Dave > Date: Mon, 12 Aug 2002 21:46:28 -0500 (CDT) > From: Gilles Detillieux <gr...@sc...> > To: Dave Hoover <dh...@rc...> > Cc: htd...@li... > Subject: Re: [htdig-dev] htdigb4 - phrase search bad_words > > According to Dave Hoover: > > I am using htdig-3.2.0b4-20020210 so I can get phrase searching to > > work. > > > > I noticed that if I type a word that is in the bad words file in a > > search string > > surronded by quotes for a phrase I know should match, I get no > > result. > > > > So a documnet cntains "After crossing the Atlantic" but if this > > search is > > entered in htdig with quotes - there are no hits found. I assume it > > is because the > > bad_word "the" is not being stripped, or the index wasn't built > > properly. > > > > Is there a way around this ? > > I know there were problems with "bad words" in phrases in the past, > but I thought those had been fixed by Feb. 10. Perhaps not, though. > Do you get the same results with the latest 3.2.0b4 snapshot? I know > there's a new query parser under development, which should solve some of > the problems with phrase searches, but I don't think it's been integrated > into the current CVS development tree yet. There may be some incremental > fixes right now, though. > > -- > Gilles R. Detillieux E-mail: <gr...@sc...> > Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ > Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) > > Dave Hoover Systems Programmer Rutgers University Libraries dh...@rc... Crippled but free, I was blind all the time I was learning to see. |