Menu

#212 Returned DOM Document instance should not contain escaped characters for attribute values

v2.23
closed-fixed
nobody
None
5
2019-09-05
2019-01-21
No

Right now HC encodes attribute values inside the Document instance which is not correct. That encoding should be done by the serializer instead.

Details can also be found in https://jira.xwiki.org/browse/XCOMMONS-1551 which shows a test case proving that the HC behavior is not correct.

Pasting the test here for clarity:

@Test
public void parse() throws Exception
{
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();

    StringBuilder xmlStringBuilder = new StringBuilder();
    xmlStringBuilder.append("<?xml version = \"1.0\"?><img src=\"http://xwiki.org?a=&amp;b\"/>");
    ByteArrayInputStream input =  new ByteArrayInputStream(xmlStringBuilder.toString().getBytes("UTF-8"));
    Document doc = builder.parse(input);
    Element root = doc.getDocumentElement();
    assertEquals("http://xwiki.org?a=&b", root.getAttribute("src"));

    OutputFormat format = new OutputFormat(doc);
    StringWriter writer = new StringWriter();
    XMLSerializer serializer = new XMLSerializer(writer, format);
    serializer.serialize(doc);
    assertEquals("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
        + "<img src=\"http://xwiki.org?a=&amp;b\"/>", writer.toString());
}

@Scott: Do you agree about the issue? Thanks!

Discussion

  • Scott Wilson

    Scott Wilson - 2019-09-04

    I'm testing this in the current code with the following test case. Does this capture it correctly?

    If so, we have it passing in the 2.23 snapshot, and I'll close the issue.

    @Test
    public void parse() throws Exception
    {
        String html = "<?xml version = \"1.0\"?><img src=\"http://xwiki.org?a=&amp;b\"/>";
        final CleanerProperties cleanerProperties = new CleanerProperties();
        final TagNode tagNode = new HtmlCleaner().clean(html);
        final Document doc = new DomSerializer(cleanerProperties, true).createDOM(tagNode);
        assertEquals("http://xwiki.org?a=&amp;b", 
                doc.getElementsByTagName("img").item(0).getAttributes().getNamedItem("src").getTextContent());
        cleanerProperties.setOmitHtmlEnvelope(true);
        String out = new SimpleXmlSerializer(cleanerProperties).getAsString(html);
        assertEquals("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<img src=\"http://xwiki.org?a=&amp;b\" />", 
                out);
    }
    
     
  • Scott Wilson

    Scott Wilson - 2019-09-04
    • Group: v2.22 --> v2.23
     
  • Scott Wilson

    Scott Wilson - 2019-09-04
    • status: open --> open-fixed
     
  • Vincent Massol

    Vincent Massol - 2019-09-05

    Thanks Scott for fixing these bugs! :)

     
    • Scott Wilson

      Scott Wilson - 2019-09-05

      Thanks for finding them for me to fix! :D

       
  • Scott Wilson

    Scott Wilson - 2019-09-05
    • status: open-fixed --> closed-fixed
     

Log in to post a comment.