Hi guys - I got the following when trying to clean some real world HTML with version 2.16:
java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.htmlcleaner.TagNode
at org.htmlcleaner.HtmlCleaner.makeTree(HtmlCleaner.java:853)
at org.htmlcleaner.HtmlTokenizer.addToken(HtmlTokenizer.java:103)
at org.htmlcleaner.HtmlTokenizer.tagEnd(HtmlTokenizer.java:608)
at org.htmlcleaner.HtmlTokenizer.start(HtmlTokenizer.java:470)
at org.htmlcleaner.HtmlCleaner.clean(HtmlCleaner.java:461)
at org.htmlcleaner.HtmlCleaner.clean(HtmlCleaner.java:371)
I manged to boil the HTML down into the following test case that exposes the issue:
public void testIssue() throws Exception
{
final String HTML =
"<html>"
+ "<body>"
+ "<table>"
+ "<ul>"
+ "<p>d</p>"
+ "</ul>"
+ "<table>"
+ "</table>"
+ "</body>"
+ "</html>";
final HtmlCleaner cleaner = new HtmlCleaner();
final TagNode tagNode = cleaner.clean(HTML);
LOGGER.debug("tagNode: "+ tagNode);
}
Thanks CB - I can confirm this is exception occurs in current trunk as well.
Fixed - it was caused by nested lists; I've now flattened them before passing onto this section.
Great stuff, thanks for the quick fix!