|
From: <kau...@cs...> - 2005-11-11 12:39:10
|
That would be a useful option to have. On 11/10/2005, "stack" <st...@ar...> wrote: > Its almost as though we shouldn't be counting outline anchor text as > part of the current document, especially for the > (common-on-front-of-sites-like-newspapers) case you describe below. >=20 > Should we make an option for the parse-html that excludes outlink anchor > text? >=20 > St.Ack >=20 >=20 > kau...@cs... wrote: >=20 > >Here's the example again, in pseudo html, hopefully browsers don't warp > >this one. > > > ><pre> > >----- Page start ----- > >[h1]English cricket results 9.11.2005[/h1] > > > >........ lots of text ..... > > > >[h2]Other top stories now[/h2] > > > >[a href=3D"http://www.tekstia.com/news/top/earthquake+timbuktu/107994"] > >Earthquake in Timbuktu second time this year[/a] > > > >[a href=3D"http://www.tekstia.com/news/top/tokyo+stocks+explode/107996"] > >Tokyo stocks explode[/a] > > > >10 further links to different subjects .. > > > >----- Page end ----- > ></pre> > > |