Re: [Tclxml-users] CDATA in text nodes

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Derek Fountain wrote:
> I've been having a better look at ActiveTcl-8.4.0.1 and seem to have it 
> running the pure tcl parser by putting "package require tclparser" just after 
> my "package require xml". (How do I tell which parser it's using?)

::xml::parserclass info default

> The problem that appears now is that an area of text inside a CDATA section 
> might get serialised such that it doesn't have the CDATA wrapper on output - 
> the meta characters get escaped individually. Looking at the code, there 
> seems to be a "node:cdatasection" flag on the node which is used to preserve 
> these CDATA wrappers. However, I'm not sure this flag is being set properly a 
> parse time. It certainly doesn't seem to be working for me!
> 
> Unfortunately I need those CDATA wrappers to stay in place, but only where 
> they appear in the input (so setting ::dom::maxSpecials to 0 doesn't help). 
> They are being used to protect HTML areas in the text which a guy downstream 
> is having to jump through some serious XSL hoops to get rendering right. 
> Strictly speaking I think he's doing kludgey things, using "CDATA" as markers 
> for certain behaviours, but I can't change what he's doing.
> 
> Is there any way I can get that node:cdatasection flag to work correctly?

You and the downstream guy need to get a better understanding of
exactly what your XML documents contain.  Consider these two documents:

	<Example>&lt;&gt;</Example>

and

	<Example><![CDATA[<>]]></Example>

They are EXACTLY the same document.  CDATA Sections are simply a
syntactic convenience to save having to escape individual characters.

TclXML and TclDOM operate at a slightly higher level than the raw
XML syntax; the XML Infoset.  TclDOM manages the raw syntax on your
behalf.  In the case of text nodes, if the number of special
characters exceeds some threshold then it will automatically use
a CDATA Section.  You may like to investigate playing around with
the threshold value...

The other thing to note is that if the downstream guy is using
XSLT then I don't understand why the use of CDATA sections (or not)
is an issue at all.  XSLT never even sees the CDATA sections - they
get transparently turned into text nodes before template processing
begins.  This one of the major differences between the XPath and
DOM data models.

Summary: relying on the use of CDATA Sections in an XML document
is bad application design.

HTHs,
Steve Ball

-- 
Steve Ball            |   XSLT Standard Library   | Training & Seminars
Zveno Pty Ltd         |     Web Tcl Complete      |   XML XSL Schemas
http://www.zveno.com/ |      TclXML TclDOM        | Tcl, Web Development
Ste...@zv...  +---------------------------+---------------------
Ph. +61 2 6242 4099   |   Mobile (0413) 594 462   | Fax +61 2 6242 4099