Re: [xml-cppdom-devel] [PATCH]: Preserve newlines in cdata

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Carsten Neumann wrote:
>     Hello,
> 
> attached patch modifies the tokenizer to preserve newlines in cdata. The
> way I understand the xml spec that is what is intended there. The test
> still pass, but cdata may now contain some trailing whitespace - not
> sure if that is a problem though [1].
> FWIW my use case was that we had some shader code stored in an xml file
> that I wanted to read with cppdom, but since the newlines got removed
> from things like:
> 
> <fragment_program>
> #ifdef HAS_NORMAL_MAP
> // read normal from normal map texture
> #endif
> </fragment_program>
> 
> the preprocessor conditionals stopped working correctly. An alternative
> would be to support the "<![CDATA["  "]]>" construct, but that seemed to
> require a better understanding of the interaction between tokenizer and
> parser.
> 
>     Cheers,
>         Carsten
> 
> [1] The whitespace comes from the indention of </some_tag>:
> 
> <root>
>     <some_tag>
>     cdata here
>     </some_tag>
> </root>
> 
> i.e. the cdata string is: "cdata_here\n    ".

I committed your change, but this brings up a point in CppDOM that confuses
me. I think that the CppDOM "cdata" concept is really the DOM text node. I
am not sure if XML CDATA is even supported by CppDOM, but I suppose that I
could write a test to determine that.

The difference is between the following:

<fragment_program>
#ifdef HAS_NORMAL_MAP
// read normal from normal map texture
#endif
</fragment_program>

versus:

<fragment_program>
<![CDATA[#ifdef HAS_NORMAL_MAP
// read normal from normal map texture
#endif
]]>
</fragment_program>

Perhaps there does not need to be a distinction as far as user-level code is
concerned. What makes me wonder is that actual DOM implementations do
distinguish between the two in their APIs. JDOM, however, has
Element.getText(), which returns what they describe as "the concatenation of
all Text and CDATA nodes returned by getContent()." Note that JDOM has the
class type org.jdom.CDATA which is a subclass of org.jdom.Text. CppDOM
identifies nodes as being of the "cdata" type, but I think that, in CppDOM
terms, this means that the node contains a sequence of characters.

I guess as long as CppDOM can properly parse both a CDATA node and a text
node in the input XML, my concerns don't matter. It may really just boil
down to a terminology issue.

 -Patrick

-- 
Patrick L. Hartling
Senior Software Engineer, Priority 5
http://www.priority5.com/