CDATA Content

Help
2013-03-01
2013-03-06
  • Maxi Sbrocca

    Maxi Sbrocca - 2013-03-01

    Hello all,

      We have a model create with pyxb. Now, we need to add a CDATA tag as part of a content of a tag. Something like this:

    spa = span()
    spa.reset()
    testxml = '<![CDATA]>'
    spa.append(testxml)

    When we do spa.toxml() we get something like this:

    <?xml version="1.0" ?><wf:span>&lt;![CDATA]&gt;</wf:span>

    If we do this:

    from xml.sax.saxutils import unescape
    unescape(str(spa.toxml()))

    We get the correct result. But we can't unescape the entire result (A conjunction of objects) because we get an invalid XML code.

    So, any idea about how to add a CDATA tag as part of the contet of a model object?

    Thanks in advance,
    Regards,
    Maxi

     
  • Peter A. Bigot

    Peter A. Bigot - 2013-03-01

    CDATA is at the XML level and has nothing to do with XML schema: the content of an element is its content.  The purpose of a CDATA section is to allow < and & to appear in element content without being misinterpreted as XML control characters.  The correct way to assign that content in PyXB is to just set textxml to 'This '.   There is no reason to use a CDATA section here, and (to my knowledge) there is no way in XML schema to specify that content must be enclosed in a CDATA section.  If the value were '<this> is not <xml>' then you would need a CDATA section to enclose it.

    When parsing PyXB will recognize CDATA sections in input and set the content to the corresponding decoded value (this comes automatically from the XML parser), but I don't believe it makes any attempt to detect that a value must be wrapped in a CDATA section when generating XML, unless this is done by the underlying DOM support infrastructure when creating a text node.

    If it is not done automatically, there are a couple possible approaches, but the best would be to modify the BindingDOMSupport class to provide an appendCDATANode method, and use that instead of appendTextNode when the content appears to require escaping.  I'll look into that if you'll file a trac ticket for it.

     
  • Maxi Sbrocca

    Maxi Sbrocca - 2013-03-01

    Hi Peter, good morning.

    I know that "This" doesn't need cdata. But, for example, I'm trying:

    from xml.dom.minidom import Document
    spa = span()
    spa.reset()
    doc = Document()
    lala = doc.createCDATASection(" > Texto < ")
    spa.append(lala)

    and:

    from xml.dom.minidom import Document
    spa = span()
    spa.reset()
    doc = Document()
    #lala = doc.createCDATASection(" > Texto < ")
    testxml = '> Texto <'
    spa.append(testxml)

    and for both, I'm getting str(spa.toxml()) :

    <?xml version="1.0" ?><wf:span>&gt; Texto &lt;</wf:span>

     
  • Maxi Sbrocca

    Maxi Sbrocca - 2013-03-01

    Guys…

    We made the following. In basis.py we replaced login of line 2299 for this:

                if self.value() is not None:
                    if self.value().startswith('<![CDATA'):
                        value = self.value().xsdLiteral().replace('<![CDATA]>','')
                        element.appendChild(dom_support.document().createCDATASection(value))
                    else:
                        element.appendChild(dom_support.document().createTextNode(self.value().xsdLiteral()))
                else:
                    element.appendChild(dom_support.document().createTextNode(""))

    and seems is working. Now, the result is:

    <?xml version="1.0" ?><wf:span><![CDATA]></wf:span>

    I hope this helps to improve PyXB.

    Regards!

     
  • Peter A. Bigot

    Peter A. Bigot - 2013-03-01

    That'd work for your code, but I won't merge it since it's not correct in general.  What you should do in PyXB is:

    spa.append('> Texto <')
    

    This gives you:

    <wf:span>&gt; Texto &lt;</wf:span>
    

    which is correct.   That you would prefer it be expressed as:

    <wf:span><![CDATA[>Texto <]]></wf:span>
    

    which is also correct is not something PyXB can help with.  In terms of content the two are equivalent.

    I would consider an enhancement request to use CDATA sections instead of substituting entity references for character content, but not by examining the value of the content.

    Here's the unit test I used to verify this is working as intended:

    # -*- coding: utf-8 -*-
    import logging
    if __name__ == '__main__':
        logging.basicConfig()
    _log = logging.getLogger(__name__)
    import types
    import pyxb.binding.generate
    import pyxb.binding.datatypes as xs
    import pyxb.binding.basis
    import pyxb.utils.domutils
    import os.path
    xsd='''<?xml version="1.0" encoding="UTF-8"?>
    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
      <xs:element name="var" type="xs:string"/>
    </xs:schema>'''
    #file('schema.xsd', 'w').write(xsd)
    code = pyxb.binding.generate.GeneratePython(schema_text=xsd)
    #file('code.py', 'w').write(code)
    rv = compile(code, 'test', 'exec')
    eval(rv)
    from pyxb.exceptions_ import *
    import unittest
    class TestTrac_HACK (unittest.TestCase):
        def testParse(self):
            instance = CreateFromDocument("<var>text</var>")
            self.assertEqual(instance, 'text')
            instance = CreateFromDocument("<var><![CDATA[text]]></var>")
            self.assertEqual(instance, 'text')
            instance = CreateFromDocument("<var>&gt; text &lt;</var>")
            self.assertEqual(instance, '> text <')
            instance = CreateFromDocument("<var><![CDATA[> text <]]></var>")
            self.assertEqual(instance, '> text <')
        def testGenerate(self):
            instance = var('text')
            self.assertEqual(instance, 'text')
            instance = var('>text<')
            self.assertEqual(instance, '>text<')
            self.assertEqual(instance.toxml('utf-8', root_only=True), u'<var>&gt;text&lt;</var>')
            
    if __name__ == '__main__':
        unittest.main()
    
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks