Menu

#158 NullPointerException in HtmlCleaner.saveToLastOpenTag

v2.19
closed-fixed
nobody
None
5
2017-02-06
2015-11-23
Code Buddy
No

Here's a test case that exposses the issue:

    @Test
    public void testCrash()
    {
        final String HTML = "<html xmlns=\"foo\">"

                + "<head>"
                + "</head>"
                + "<body>"
                + "<table>"
                + "<tr>"
                + "<td>"
                + "<BR />"
                + "</td>"   
                + "</tr>"
                + "</table>"
                + "<div>"
                + "</div>"
                + "</body>"
                + "</html>";

        new HtmlCleaner().clean(HTML);
    }

Found in 2.14 and also present in 2.15. On the face of it looks a lot like bug 133 but subtly difference as that test case still passes.

Playing around with the HTML seems like you need a few things to trigger this - the namespace, uppercase br tag and the div.

Shout if you need anymore info!

Discussion

  • Scott Wilson

    Scott Wilson - 2015-11-23

    Good catch CB! I'll see if I can track down the cause of it.

     
  • Code Buddy

    Code Buddy - 2015-11-23

    Thats great, thanks Scott!

     
  • Scott Wilson

    Scott Wilson - 2015-12-02
    • Group: v 2.7 --> v2.16
     
  • Scott Wilson

    Scott Wilson - 2015-12-02

    Right, well first off, as we're declaring quite clearly "This isn't HTML, its FOO" then a lot of the usual rules on tags get suspended - so everything in the doc is a plain old XML tag. Something weird then happens with the self-closing BR tag not actually closing: without the self-closing tag we're all good still.

    So what I think happens is that we get to the self-closing BR tag. We're using HTML5 rules, so there is no such things as a self-closing tag, and we treat it as an open tag. Next we process the DIV tag. We check the last open tag, BR, and find it doesn't accept anything, so it finds the last valid open HTML tag. Which doesn't exist as there are no HTML tags that allow child elements as there are no HTML tags inside the BODY.

    I can easily add a NULL check on the saveToLastOpenTag() method. Which handles the NPE, but then the DIV vanishes as it can't be placed elsewhere.

    What we would expect is for the self-closing tag to be handled appropriately.

     
  • Code Buddy

    Code Buddy - 2016-03-31

    Just update to 2.16 and this seems to be fixed, certinaly for my test cases. I got a mention in the release notes, so assume this one just needs to be marked as closed :-)

     
  • Scott Wilson

    Scott Wilson - 2016-08-17
    • Group: v2.16 --> v2.17
     
  • Scott Wilson

    Scott Wilson - 2016-10-19
    • Group: v2.17 --> v2.18
     
  • Scott Wilson

    Scott Wilson - 2017-02-06
    • Group: v2.18 --> v2.19
     
  • Scott Wilson

    Scott Wilson - 2017-02-06
    • status: open --> closed-fixed
     

Log in to post a comment.

MongoDB Logo MongoDB