|
From: stack <st...@ar...> - 2005-11-07 20:47:26
|
kau...@cs... wrote: >Hi and thanks for the addition to faq. > >Many news sites have a structure which may distort ranking of search >results. On these sites each page focuses on one current news item but >it also has loads of tiny links to other news, like 'other top >stories' , 'news of previous day' or 'stories of this week'. > > Just to be clear, in the above, you mean that the outlink anchor text says 'other top stories' and 'stories of this week'? >In an indexed archive, you could search for 'earth quake' and find >close to top search results a page with headline 'cricket results'. >Reason being that the cricket page has one link to earth quake news of >previous day. > >This feature is noticeable when there are few pages with real earth quake >reports but lots of other pages having links to them. > > > Yes. Makes sense. >Link texts should have a low priority in indexing .. probably I can make >this happen when I find the correct parameters. > > > You can set the boost on inlink anchor text -- see the just-added FAQ -- but looks like you want to be able to set separately the boost on a documents' outlink anchor text. Looking at the Nutch html parser code, currently the outlink anchor text just gets added to the StringBuffer accumulating all document parsed 'text'; the outlink anchor text is not distingushed in any way from the general text of the document. There is currently no means of making its boost be different from that of the general document 'text'. Should we add such a feature Kaisa? Yours, St.Ack >kaisa > >On 11/2/2005, "stack" <st...@ar...> wrote: > > >>It should be doing this for you Kaisa. In general, are you not seeing >>the most significant links showing first in results? >> >> > > >------------------------------------------------------- >SF.Net email is sponsored by: >Tame your development challenges with Apache's Geronimo App Server. Download >it for free - -and be entered to win a 42" plasma tv or your very own >Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php >_______________________________________________ >Archive-access-discuss mailing list >Arc...@li... >https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > > |