Right now HC encodes attribute values inside the Document instance which is not correct. That encoding should be done by the serializer instead.
Details can also be found in https://jira.xwiki.org/browse/XCOMMONS-1551 which shows a test case proving that the HC behavior is not correct.
Pasting the test here for clarity:
@Test
public void parse() throws Exception
{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
StringBuilder xmlStringBuilder = new StringBuilder();
xmlStringBuilder.append("<?xml version = \"1.0\"?><img src=\"http://xwiki.org?a=&b\"/>");
ByteArrayInputStream input = new ByteArrayInputStream(xmlStringBuilder.toString().getBytes("UTF-8"));
Document doc = builder.parse(input);
Element root = doc.getDocumentElement();
assertEquals("http://xwiki.org?a=&b", root.getAttribute("src"));
OutputFormat format = new OutputFormat(doc);
StringWriter writer = new StringWriter();
XMLSerializer serializer = new XMLSerializer(writer, format);
serializer.serialize(doc);
assertEquals("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
+ "<img src=\"http://xwiki.org?a=&b\"/>", writer.toString());
}
@Scott: Do you agree about the issue? Thanks!
I'm testing this in the current code with the following test case. Does this capture it correctly?
If so, we have it passing in the 2.23 snapshot, and I'll close the issue.
Thanks Scott for fixing these bugs! :)
Thanks for finding them for me to fix! :D