Found in 2.14 and checked against 2.15. The following test case produces a NullPointerException:
public void testWithInvalidDocType()
{
final String HTML = "<!DOCTYPE>";
final TagNode tagNode = new HtmlCleaner().clean(HTML);
final CleanerProperties cleanerProperties = new CleanerProperties();
try
{
new DomSerializer(cleanerProperties).createDOM(tagNode);
}
catch (ParserConfigurationException e)
{
e.printStackTrace();
}
}
The code in DomSerializer::createDOM() checks the docType of the root not is not null:
if (rootNode.getDocType() != null){
But not its contents, so in the above case qualifiedName is now null at this point:
String qualifiedName = rootNode.getDocType().getPart1();
And the passed off to CoreDOMImplementationImpl::createDocumentType() and then checkQName() which does:
int index = qname.indexOf(':');
on the null qname.
Here's the stack trace:
java.lang.NullPointerException
at com.sun.org.apache.xerces.internal.dom.CoreDOMImplementationImpl.checkQName(CoreDOMImplementationImpl.java:176)
at com.sun.org.apache.xerces.internal.dom.CoreDOMImplementationImpl.createDocumentType(CoreDOMImplementationImpl.java:171)
at org.htmlcleaner.DomSerializer.createDOM(DomSerializer.java:100)
Any queries happy to provide more info - thanks!
Thanks for spotting that one CB!
Hmm, there are two ways of handling this.
if (qualifiedName == null) qualifiedName = "html";rootNode.getDocType() != null && rootNode.getDocType.isValid()Option 1 is the smallest change to fix the issue, but Option 2 feels better - we shouldn't be trying to use invalid DOCTYPEs in creating a DOM.
I've applied the simple fix for now; in future I think it would be good to correct invalid doctypes wherever possible.