From: Patrick R. <pg...@vt...> - 2002-01-23 04:20:11
|
On Tue, Jan 22, 2002 at 02:17:58PM -0600, Gilles Detillieux wrote: > According to Patrick Robinson: > > [htdig-3.1.5 on Solaris 2.6] > > Is there a way to include one or more specific "short words" -- that is, > > words that are shorter than minimum_word_length -- in htdig's index? > > > > Specifically, my problem is that I want to index and be able to find > > documents containing the term "4-H", even though: > > (a) my minimum_word_length is set to the default of 3, > > (b) "4-H" contains a dash, which is punctuation (I'm using the default > > valid_punctuation string), so it would be indexed as "4h", if it weren't > > too short to be indexed. > > You could take the hyphen out of valid_punctuation and put it in > extra_word_characters instead, but that may not be what you want in > the general case. Unfortunately, there's no way of singling out > specific words for special treatment. That's what I was afraid of. The removal of the hyphen when indexing and searching isn't so much of a concern. I mostly wanted to be able to include "4-h" (or "4h") in the db. I suppose the other "solution" would be to reduce minimum_word_length to 2, and then add all the other 2 letter words which occur to bad_word_list. A bit of a maintenance headache, but maybe unavoidable in this case. Thanks, -- Patrick Robinson AHNR Info Technology, Virginia Tech pg...@vt... |