Menu

#4 HTMLFilter cannot properly handle content with hyperlinks

open
nobody
News Filter (4)
5
2007-05-14
2007-05-14
No

Since NewsRack is focused on news articles, it attempts to extract the content of the news article by ignoring all text within hyperlinks. This ensures that links to related articles and advertisements are not processed during news filtering. Eliminating all these is essential to accuracy of the news filtering. However, certain websites (Slate, Salon, for ex.) hyperlink within the text of the article. As a result, some of this text gets ignored by NewsRack during filtering.

This is a bug if NR is being used to process news articles which have lot of hyperlinking within the content of the news article. However, for its primary focus (online versions of print news publications), this hyperlinking is non-existent (and rare even for many online news publications).

Need to use a different technique altogether for things like blogs and for online news publications that make use of heavy hyperlinking within the main body of the text.

Discussion


Log in to post a comment.