Menu

#178 java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.htmlcleaner.TagNode

v2.17
closed-fixed
None
5
2016-08-18
2016-08-17
Code Buddy
No

Hi guys - I got the following when trying to clean some real world HTML with version 2.16:

java.lang.ClassCastException: java.util.ArrayList cannot be cast to org.htmlcleaner.TagNode
at org.htmlcleaner.HtmlCleaner.makeTree(HtmlCleaner.java:853)
at org.htmlcleaner.HtmlTokenizer.addToken(HtmlTokenizer.java:103)
at org.htmlcleaner.HtmlTokenizer.tagEnd(HtmlTokenizer.java:608)
at org.htmlcleaner.HtmlTokenizer.start(HtmlTokenizer.java:470)
at org.htmlcleaner.HtmlCleaner.clean(HtmlCleaner.java:461)
at org.htmlcleaner.HtmlCleaner.clean(HtmlCleaner.java:371)

I manged to boil the HTML down into the following test case that exposes the issue:

public void testIssue() throws Exception
{
    final String HTML = 
            "<html>"

            + "<body>"
            + "<table>"
            + "<ul>"
            + "<p>d</p>"
            + "</ul>"
            + "<table>"
            + "</table>"
            + "</body>"
            + "</html>";

    final HtmlCleaner cleaner = new HtmlCleaner();  
    final TagNode tagNode = cleaner.clean(HTML);
    LOGGER.debug("tagNode: "+ tagNode);
}

Discussion

  • Scott Wilson

    Scott Wilson - 2016-08-17
    • status: open --> open-accepted
    • assigned_to: Scott Wilson
    • Group: v 2.7 --> v2.17
     
  • Scott Wilson

    Scott Wilson - 2016-08-17

    Thanks CB - I can confirm this is exception occurs in current trunk as well.

     
  • Scott Wilson

    Scott Wilson - 2016-08-17
    • status: open-accepted --> closed-fixed
     
  • Scott Wilson

    Scott Wilson - 2016-08-17

    Fixed - it was caused by nested lists; I've now flattened them before passing onto this section.

     
  • Code Buddy

    Code Buddy - 2016-08-18

    Great stuff, thanks for the quick fix!

     

Log in to post a comment.

MongoDB Logo MongoDB