[xmljs-users] Whitespace stripped before/after entities.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

The W3CDOM parser incorrectly strips whitespace before and after entities.  For
example, the tag
    <TAG>Say Hello &lt;b&gt;World&lt;/b&gt; and Sky</TAG>
should be parsed as
    "Say Hello <b>World</b> and Sky"
but is incorrectly parsed as
    "Say Hello<b>World</b>and Sky"

A workaround (for my application) is to use
   parser.preserveWhiteSpace = true;
and then trim the text nodes myself.

The problem is caused in DOMImplementation__parseLoop() where you trim pContent
in XMLP._TEXT nodes.  The trimming should really only happen after text nodes
have been normalized (and thus contain entities).

The following demonstrates the problem:

function xmljsDOMExample() {
var xml;
xml = ""
+ "<?xml version=\"1.0\"?>"
+ "<ROOT>"
+ "<TAG1>"
+ "Say Hello &lt;b&gt;World&lt;/b&gt; and Sky"
+ "</TAG1>"
+ "</ROOT>";

//instantiate the W3C DOM Parser
var parser = new DOMImplementation();

parser.preserveWhiteSpace = true;

//load the XML into the parser and get the DOMDocument
var domDoc = parser.loadXML(xml);

//get the root node
var docRoot = domDoc.getDocumentElement();

//get the "TAG1" element
var tag1 = docRoot.getElementsByTagName("TAG1").item(0);

//the following should be
//"Hello <b>World</b> and Sky"
alert(tag1.firstChild.data);
}// end function xmljsDOMExample

    - Stepan

-- 
Stepan Riha