[Xmlunit-general] HTMLDocumentBuilder and   bug?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,
I'm using XMLUnit primarily for HTMLDocumentBuilder and 
TolerantSaxDocumentBuilder (nice tools btw!).   If I build a Document 
from html with &nbsp; in it the String contents of the Node in question 
have weird bytes where the space should be.  I ran into this trying to 
split a resulting string on whitespace.

For example, with <body>test&nbsp;after</body>, the body text string I 
get has the following utf-8 bytes:

bytes: 116 101 115 116 -62 -96 97 102 116 101 114

I was expecting to find 32 where the -62 and -96 are.  Bug?

I'm using latest version with java 1.6.0.16.

Thanks,
Tony Rozga

Here is a test (not JUnit though :):

import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.custommonkey.xmlunit.HTMLDocumentBuilder;
import org.custommonkey.xmlunit.TolerantSaxDocumentBuilder;
import org.custommonkey.xmlunit.XMLUnit;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;

public class XmlUnitBug {

    public static void main(String[] args) {

        try {

            String html = "test&nbsp;after";
            TolerantSaxDocumentBuilder tolerantSaxDocumentBuilder = new 
TolerantSaxDocumentBuilder(XMLUnit.newTestParser());
            HTMLDocumentBuilder builder = new 
HTMLDocumentBuilder(tolerantSaxDocumentBuilder);
            Document doc = builder.parse(html);

            XPathFactory factory = XPathFactory.newInstance();
            XPath xpath = factory.newXPath();
            XPathExpression expr = xpath.compile("/html/body");
            String body = ((NodeList) expr.evaluate(doc, 
XPathConstants.NODESET)).item(0).getTextContent();
            System.out.println("body: " + body);
            System.out.print("bytes: ");
            byte[] bytes = body.getBytes("UTF-8");
            for (byte b : bytes) {
                System.out.print(b);
                System.out.print(" ");
            }
            System.out.println("");
        } catch (Exception ex) {
            System.out.println("whoops: " + ex);
        }
    }
}

[Xmlunit-general] HTMLDocumentBuilder and &nbsp; bug?

XMLUnit provides assertions that help testing code that produces XML.

[Xmlunit-general] HTMLDocumentBuilder and &nbsp; bug?

[Xmlunit-general] HTMLDocumentBuilder and bug?

[Xmlunit-general] HTMLDocumentBuilder and bug?