Menu

#176 Crash: IllegalArgumentException in convertToUnicode

v2.17
closed-fixed
nobody
None
5
2016-08-17
2016-07-28
Code Buddy
No

Hi guys,

Bit of real world HTML causing a crash, here's the stack trace:

java.lang.IllegalArgumentException
    at java.lang.Character.toChars(Character.java:5172)
    at org.htmlcleaner.Utils.convertToUnicode(Utils.java:364)
    at org.htmlcleaner.Utils.escapeXml(Utils.java:156)
    at org.htmlcleaner.Utils.isEmptyString(Utils.java:480)
    at org.htmlcleaner.ContentNode.<init>(ContentNode.java:53)
    at org.htmlcleaner.HtmlTokenizer.addSavedAsContent(HtmlTokenizer.java:328)
    at org.htmlcleaner.HtmlTokenizer.content(HtmlTokenizer.java:815)
    at org.htmlcleaner.HtmlTokenizer.start(HtmlTokenizer.java:486)
    at org.htmlcleaner.HtmlCleaner.clean(HtmlCleaner.java:461)
    at org.htmlcleaner.HtmlCleaner.clean(HtmlCleaner.java:371)

And a test case that reproduces the issue:

    public void testUnicodeIssue()
    {
        final String HTML = "<html>"

                + "<body>Brine&#2013266066;s."
                + "</body>"
                + "</html>";
        try
        {
            final TagNode tagNode = new HtmlCleaner().clean(HTML);
            final CleanerProperties cleanerProperties = new CleanerProperties();
            new DomSerializer(cleanerProperties).createDOM(tagNode);
        }
        catch (IllegalArgumentException e)
        {
            e.printStackTrace();
            fail();
        }
        catch (ParserConfigurationException e)
        {
            e.printStackTrace();
        }
    }

Discussion

  • Code Buddy

    Code Buddy - 2016-07-28

    Using 2.16 btw!

     
  • Scott Wilson

    Scott Wilson - 2016-08-17
    • status: open --> closed-fixed
    • Group: v 2.7 --> v2.17
     
  • Scott Wilson

    Scott Wilson - 2016-08-17

    Fixed - the unicode parser now handles invalid code points more gracefully.

     

Log in to post a comment.

MongoDB Logo MongoDB