|
From: <kau...@cs...> - 2005-11-09 12:43:22
|
On 11/7/2005, "stack" <st...@ar...> wrote: > >Many news sites have a structure which may distort ranking of search > >results. On these sites each page focuses on one current news item but > >it also has loads of tiny links to other news, like 'other top > >stories' , 'news of previous day' or 'stories of this week'. > Just to be clear, in the above, you mean that the outlink anchor text > says 'other top stories' and 'stories of this week'? Hi, I tried to describe pages and links like ------- Page start ------- <h1>English cricket results 9.11.2005</h1> ........ lots of text ..... <h2>Other top stories now</h2> <a href=3D"http://www.tekstia.com/news/top/earthquake+timbuktu/107994"> Earthquake in Timbuktu second time this year</a> <a href=3D"http://www.tekstia.com/news/top/tokyo+stocks+explode/107996"> Tokyo stocks explode</a> 10 further links to different subjects .. ---- Page end ------------- Above the links are <a href=3Durl>text</a> , and both text and url contain words which actually don't belong to the body text of the cricket news article. > >In an indexed archive, you could search for 'earth quake' and find > >close to top search results a page with headline 'cricket results'. > >Reason being that the cricket page has one link to earth quake news of > >previous day. > > > >This feature is noticeable when there are few pages with real earth quake > >reports but lots of other pages having links to them. > > > > > > > Yes. Makes sense. > > You can set the boost on inlink anchor text -- see the just-added FAQ -- > but looks like you want to be able to set separately the boost on a > documents' outlink anchor text. Looking at the Nutch html parser code, > currently the outlink anchor text just gets added to the StringBuffer > accumulating all document parsed 'text'; the outlink anchor text is not > distingushed in any way from the general text of the document. There is > currently no means of making its boost be different from that of the > general document 'text'. >=20 > Should we add such a feature Kaisa? |