On 12/17/11 4:36 AM, Guenter Milde wrote:
Dear Paul,

On 2011-12-16, Paul Tremblay wrote:

I like your suggestion because it kills two birds with one
stone--allowing for raw XML (my next suggested patch) and mathML,
without the waste of conversion to a node and then to a string.
...

I assumed from subsequent emails that docutils did not want this feature. I
would have to go back and see show far I got, but if I remember correctly,
this idea was very easy to implement and I had more or less completed it.
There are two parts that need separate consideration:

I would like to see "raw XML" properly implemented/supported in Docutils and
I am ready to help to achieve this.

MathML input for math is more tricky: to get it into the core, all output
formats should be supported, that means we need a MathML->LaTeX converter.
Therefore I suggested to start this feature as a sandbox project.

Sorry for the confusion.

Hi Guenter,

Here is some code that converts a docutils string to raw xml. See the code at the very bottom for what the string looks like. I generated the string with rst2xml.py.

I'm a bit rusty on how to implement things is docutils, but if I am remembering correctly, there should be much additional code needed. We just have to convert the docutils dom to a string (which is done already) and feed the string to convertRaw.

Issues:

1. I haven't checked this code with different encodings. I believe the docutils string is always utf8 just before it prints out, so that should not be a problem.

2. The code outputs to std.out. It probably would be better to have the conversion saved as a string and then returned.

Paul

===========
#!/usr/bin/python

import os, sys, argparse, io
import xml.sax.handler
from xml.sax.handler import feature_namespaces
from StringIO import StringIO


from xml.sax import InputSource

class InvaidXml(Exception):
    pass


class convertRaw(xml.sax.ContentHandler):
 
  def __init__(self, mathml=False, raw_xml=True ):
        self.__characters = ''
        self.__mathml = mathml
        self.__raw_xml = raw_xml
        self.__write_raw = False
        self.__ns_dict = {'http://www.w3.org/XML/1998/namespace': "xml"}


  def startDocument(self):
      pass


  def characters (self, characters):
    self.__characters += characters


  def startElementNS(self, name, qname, attrs):
        self.__write_text()
        ns = name[0]
        el_name = name[1]
        sys.stdout.write('<')
        if el_name == 'raw':
            if attrs.get((None, 'format')) == 'xml' and self.__raw_xml:
                self.__write_raw = True
        if ns:
            sys.stdout.write('ns1:%s' % el_name)
        else:
            sys.stdout.write(el_name)
        if ns:
            sys.stdout.write(' xmlns:ns1="%s"' % ns)

        the_keys = list(attrs.keys())
        counter = 1
        for the_key in the_keys:
            counter +=1
            ns_att = the_key[0]
            att_name = the_key[1]
            value = attrs[the_key]
            ns_prefix = self.__ns_dict.get(ns_att)
            if ns_att and not ns_prefix:
                raise InvaidXml('No namespace for "%s"\n' % (ns_att))
            if ns_att and ns_prefix == 'xml':
                sys.stdout.write(' xml:%s="%s"' % (att_name, value))
            elif ns_att:
                raise InvaidXml('Sorry, but don\'t know what to do with ns "%s"\n' % (ns_prefix))
            else:
                sys.stdout.write(' %s="%s"' % (att_name, value))
        sys.stdout.write('>')

   

  def __write_text(self, raw = False):
        if raw:
            text = self.__characters
        else:
            text =  xml.sax.saxutils.escape(self.__characters)
        sys.stdout.write(text)
        self.__characters = ''

  def endElementNS(self, name, qname):
        ns = name[0]
        el_name = name[1]
        if (el_name == 'math_block' and  self.__mathml) or (el_name == 'math' and self.__mathml) :
            pass
            # not implemented yet (thought I have code to implement this)
        elif el_name == 'raw' and self.__write_raw:
            self.__write_text(raw = True)
            self.__write_raw = False
        else:
            self.__write_text()
        if ns:
            raise InvaidXml('Should not be namespace "%s" here\n' % (ns))
        else:
            sys.stdout.write('</%s>' % el_name)




if __name__ == '__main__':
    the_string = """
    <document source="test.rst"><paragraph>text</paragraph><raw format="xml" xml:space="preserve">&lt;ns:root xmlns:ns=&quot;http://www.ns.org&quot;/&gt;</raw></document>
    """
    read_obj = StringIO(the_string)
    the_handle=convertRaw()
    parser = xml.sax.make_parser()
    parser.setFeature(feature_namespaces, 1)
    parser.setContentHandler(the_handle)
    parser.setFeature("http://xml.org/sax/features/external-general-entities", True)
    try:
        parser.parse(read_obj)            
    except xml.sax._exceptions.SAXParseException as error:
        msg = error.args[0]
        raise InvaidXml(msg)
    except InvaidXml as error:
        msg = error.args[0]
        raise InvaidXml(msg)
    read_obj.close()