Menu

#119 Tag combination causes internal loop

v 2.9
closed-fixed
nobody
None
5
2014-08-15
2014-06-02
Simon Tyler
No

The following HTML causes a tight loop inside the cleaner"

<html xmlns="http://www.w3.org/TR/REC-html40">
<body>
<BR>
<hr width="100%"
Test
</body>
</html>

It appears to be a combination of the xlmns and <BR> - both are needed to trigger it.

No special options are needed to HtmlCleaner to trigger the issue.

Discussion

  • Simon Tyler

    Simon Tyler - 2014-06-03

    I seem to have lost some of my text above.

    To trigger this you need the xlmns attribute and an upper case tag - BR in this case. Not all upper case tags trigger it.

     
  • Scott Wilson

    Scott Wilson - 2014-06-03

    Thanks Simon, I'll take a look and see if I can figure out what's happening

     
  • Scott Wilson

    Scott Wilson - 2014-06-03

    I've just tried the test case and verified it with the 2.8 jar; however when I tried it with current trunk I don't seem to have an issue so it looks like its been fixed as a side effect of another change.

     
  • Emmanuel Keller

    Emmanuel Keller - 2014-08-02

    Hi,

    We have a similar issue with the attached HTML file, tested using command line v2.8:

    java -jar htmlcleaner-2.8.jar src=mail_word.htm
    

    May I know when you plan to publish the next release ?

    We (proudly) integrate HtmlCleaner in OpenSearchServer. Unfortunately some of our user are facing this issue.

     
  • Scott Wilson

    Scott Wilson - 2014-08-13

    Hi Emmanuel,

    I plan to release 2.9 this month.

     
  • Scott Wilson

    Scott Wilson - 2014-08-13
    • status: open --> closed-fixed
    • Group: v 2.8 --> v 2.9
     

Log in to post a comment.