HTML Filter cannot handle multi-page articles
Status: Beta
Brought to you by:
subbu_ss
Several news publications split up their long articles into multiple pages (Times of India, New York Times, etc.) However, HTMLFilter cannot recognize the second and subsequent pages currently. So, in these multi-page articles, only the first page gets processed which will affect how the article gets filtered and classified.
Need to work out techniques to tackle this problem.