The following HTML causes a tight loop inside the cleaner"
<html xmlns="http://www.w3.org/TR/REC-html40"> <body> <BR> <hr width="100%" Test </body> </html>
It appears to be a combination of the xlmns and <BR> - both are needed to trigger it.
No special options are needed to HtmlCleaner to trigger the issue.
I seem to have lost some of my text above.
To trigger this you need the xlmns attribute and an upper case tag - BR in this case. Not all upper case tags trigger it.
Thanks Simon, I'll take a look and see if I can figure out what's happening
I've just tried the test case and verified it with the 2.8 jar; however when I tried it with current trunk I don't seem to have an issue so it looks like its been fixed as a side effect of another change.
Hi,
We have a similar issue with the attached HTML file, tested using command line v2.8:
May I know when you plan to publish the next release ?
We (proudly) integrate HtmlCleaner in OpenSearchServer. Unfortunately some of our user are facing this issue.
Hi Emmanuel,
I plan to release 2.9 this month.