Menu

#153 NullPointerException when <!DOCTYPE> doesn't contain a qualifiedName

v2.16
closed-fixed
nobody
None
5
2015-10-23
2015-10-01
Code Buddy
No

Found in 2.14 and checked against 2.15. The following test case produces a NullPointerException:

public void testWithInvalidDocType()
{
    final String HTML = "<!DOCTYPE>";
    final TagNode tagNode = new HtmlCleaner().clean(HTML);
    final CleanerProperties cleanerProperties = new CleanerProperties();
    try
    {
        new DomSerializer(cleanerProperties).createDOM(tagNode);
    }
    catch (ParserConfigurationException e)
    {
        e.printStackTrace();
    }
}

The code in DomSerializer::createDOM() checks the docType of the root not is not null:

if (rootNode.getDocType() != null){

But not its contents, so in the above case qualifiedName is now null at this point:

String qualifiedName = rootNode.getDocType().getPart1();

And the passed off to CoreDOMImplementationImpl::createDocumentType() and then checkQName() which does:

    int index = qname.indexOf(':');

on the null qname.

Here's the stack trace:

java.lang.NullPointerException
at com.sun.org.apache.xerces.internal.dom.CoreDOMImplementationImpl.checkQName(CoreDOMImplementationImpl.java:176)
at com.sun.org.apache.xerces.internal.dom.CoreDOMImplementationImpl.createDocumentType(CoreDOMImplementationImpl.java:171)
at org.htmlcleaner.DomSerializer.createDOM(DomSerializer.java:100)

Any queries happy to provide more info - thanks!

Discussion

  • Scott Wilson

    Scott Wilson - 2015-10-01

    Thanks for spotting that one CB!

     
  • Scott Wilson

    Scott Wilson - 2015-10-01

    Hmm, there are two ways of handling this.

    1. Just make the QName "html" where null

    if (qualifiedName == null) qualifiedName = "html";

    1. Only create DocumentType where the DOCTYPE is valid:

    rootNode.getDocType() != null && rootNode.getDocType.isValid()

    Option 1 is the smallest change to fix the issue, but Option 2 feels better - we shouldn't be trying to use invalid DOCTYPEs in creating a DOM.

     
  • Scott Wilson

    Scott Wilson - 2015-10-23
    • status: open --> closed-fixed
    • Group: v 2.7 --> v2.16
     
  • Scott Wilson

    Scott Wilson - 2015-10-23

    I've applied the simple fix for now; in future I think it would be good to correct invalid doctypes wherever possible.

     

Log in to post a comment.

MongoDB Logo MongoDB