From: Patrick H. <pat...@gm...> - 2010-04-14 13:37:59
|
Carsten Neumann wrote: > Hello, > > attached patch modifies the tokenizer to preserve newlines in cdata. The > way I understand the xml spec that is what is intended there. The test > still pass, but cdata may now contain some trailing whitespace - not > sure if that is a problem though [1]. > FWIW my use case was that we had some shader code stored in an xml file > that I wanted to read with cppdom, but since the newlines got removed > from things like: > > <fragment_program> > #ifdef HAS_NORMAL_MAP > // read normal from normal map texture > #endif > </fragment_program> > > the preprocessor conditionals stopped working correctly. An alternative > would be to support the "<![CDATA[" "]]>" construct, but that seemed to > require a better understanding of the interaction between tokenizer and > parser. > > Cheers, > Carsten > > [1] The whitespace comes from the indention of </some_tag>: > > <root> > <some_tag> > cdata here > </some_tag> > </root> > > i.e. the cdata string is: "cdata_here\n ". I committed your change, but this brings up a point in CppDOM that confuses me. I think that the CppDOM "cdata" concept is really the DOM text node. I am not sure if XML CDATA is even supported by CppDOM, but I suppose that I could write a test to determine that. The difference is between the following: <fragment_program> #ifdef HAS_NORMAL_MAP // read normal from normal map texture #endif </fragment_program> versus: <fragment_program> <![CDATA[#ifdef HAS_NORMAL_MAP // read normal from normal map texture #endif ]]> </fragment_program> Perhaps there does not need to be a distinction as far as user-level code is concerned. What makes me wonder is that actual DOM implementations do distinguish between the two in their APIs. JDOM, however, has Element.getText(), which returns what they describe as "the concatenation of all Text and CDATA nodes returned by getContent()." Note that JDOM has the class type org.jdom.CDATA which is a subclass of org.jdom.Text. CppDOM identifies nodes as being of the "cdata" type, but I think that, in CppDOM terms, this means that the node contains a sequence of characters. I guess as long as CppDOM can properly parse both a CDATA node and a text node in the input XML, my concerns don't matter. It may really just boil down to a terminology issue. -Patrick -- Patrick L. Hartling Senior Software Engineer, Priority 5 http://www.priority5.com/ |