From: Gilles D. <gr...@sc...> - 2003-02-14 22:34:11
|
According to Adam Brown: > On Friday 14 February 2003 11:14, Adam Brown wrote: ... > > I am indexing this page using Htdig 3.1.6: > > http://wire.org.au/information/violence/domestic/womens_stories/one_rural_w > >omans_story.html The page contains the words "woman's" and "womans" but not > > "woman'. > > > > The search page is located at: http://wire.org.au/public_search.html > > > > When I search for "rural woman's" or "rural womans" I get no hits. However > > when I search for "woman" the page is returned. > > > > My understanding is that using the default Htdig settings that "woman's" > > gets indexed as "womans". So surely a search for 'womans' should be > > successful. That's correct, assuming you're using the defaultS. The word "woman's" should get indexed as both "womans" and "woman". If you had changed valid_punctuation at the time you indexed, taking out the apostrophe, then this would not work, and "woman's" would be indexed only as "woman". You should check your db.wordlist file to make sure that both woman and womans appear in there. > Researching further: > > Results from htdig -vvvv indicate that the word "woman" is indexed, not > "womans" If you are indeed running htdig version 3.1.6, then the -vvvv output should show, when the word "woman's" is parsed, the following lines: word: woman's@(location) word part: woman@(location) Both of these should go into db.wordlist, with the apostrophe being stripped from the first one. > A search for "women's" (note the e) returns a hit. I looked in the ispell > dictionary file english.0 and the listings for the two words are: > woman/MY > women/MS When htfuzzy builds the endings database, it strips out rules that contain apostrophes, so the M suffix is ignored. So, a search for "woman" will match "(woman or womanly)", and a search for "women" will match "(women or womens)". I'm not actually sure why there's an S suffix on women, but that's another matter. If you want a search for woman or women to match any of women's, woman's, woman or women, you could add a line like the following in your synonyms file and do an "htfuzzy synonyms": woman womans women womens However, searching for "woman's" in htsearch should match "womans" in the database, so if it did get indexed as such in the database, I'm at a loss to explain why htsearch isn't picking it up. > Is it the case that Htdig reduces the search word "woman's" to "womans" which > doesn't register a hit because "woman" is recorded in the database and > "womans" is not a valid extension of "woman"? No, the endings database is only used for fuzzy matching, to augment the possible matches. It's not used to invalidate any exact matches. However, the endings and synonyms fuzzy matches will only occur if these are specified in your search_algorithm attribute. You can always run "htsearch -vv words=womans" from the command line to see what it does. > I use the setting: > valid_punctuation: .-_/!#$%^&'() There should be a backslash '\' in front of the dollar sign '$', otherwise it gets swallowed up in the variable substitution phase. That shouldn't cause the apostrophe to be swallowed up, though, at least as far as I can tell. Without knowing what's in your db.wordlist, I'm at a bit of a loss as to what to suggest next. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |