|
From: Gilles D. <gr...@sc...> - 2002-10-02 16:04:01
|
According to Jessica Biola: > Actually, to append to this problem I'm running into, > it doesn't appear to be the text WITHIN the anchor tag > -- rather, there is text between the beginning <A> and > the ending </A> of the anchor tag that contains the > word "large". Ah! Well that certainly explains why I was unable to reproduce the problem earlier, based on your first message. Yes, indexing text between the <A ...> and </A> tags as link description text for the referenced document is normal behaviour for htdig. > The question still is, though, how do I completely > wipe out any reference to text between these tags as > relevant to the linked document? As Geoff pointed out, description_factor is the one that controls the score that will be applied to this text. Unfortunately, the best you can do is drop the score of these documents so they appear at the end of the results instead of at the start. In the future, there will be some sort of option for removing matches with a 0 or small score. > > There's a word, "large" on fruit.html. "large" does > > NOT appear anywhere within pineapple.html, however, > > when I htsearch on the index, both documents show as > > a > > match. In fact, the base_score is very high for the > > query "large" on the document pineapple.html. > > > > If I index pineapple.html alone, the query "large" > > yields no results. So there is definitely some type > > of relationship between the two documents and the > > word > > "large". htdump revealed the relationship by > > outputting this: ... > > What configuration attribute must set to zero in > > order > > for that extra anchor text being indexed and > > factored > > into pineapple.html's word list? I've basically > > tried > > setting all "documented" factors to zero (including > > backlink). > > > > I used the latest htdig-3.2.0b4-092902 as well as a > > January 2002 release -- both behave the same way. By the January 2002 release, do you mean the 3.1.6 release of Jan 31/02, or one of the January snapshots of 3.2.0b4? Either way, the behaviour of description_factor will be similar, the main difference being that with 3.1.x, you need to reindex after changing this factor, whereas with 3.2.x you don't. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |