On 12/17/11 4:36 AM, Guenter Milde wrote:
> Dear Paul,
>
> On 2011-12-16, Paul Tremblay wrote:
>
>>>> I like your suggestion because it kills two birds with one
>>>> stone--allowing for raw XML (my next suggested patch) and mathML,
>>>> without the waste of conversion to a node and then to a string.
> ...
>
>> I assumed from subsequent emails that docutils did not want this feature. I
>> would have to go back and see show far I got, but if I remember correctly,
>> this idea was very easy to implement and I had more or less completed it.
> There are two parts that need separate consideration:
>
> I would like to see "raw XML" properly implemented/supported in Docutils and
> I am ready to help to achieve this.
>
> MathML input for math is more tricky: to get it into the core, all output
> formats should be supported, that means we need a MathML->LaTeX converter.
> Therefore I suggested to start this feature as a sandbox project.
>
> Sorry for the confusion.
Hi Guenter,
Here is some code that converts a docutils string to raw xml. See the
code at the very bottom for what the string looks like. I generated the
string with rst2xml.py.
I'm a bit rusty on how to implement things is docutils, but if I am
remembering correctly, there should be much additional code needed. We
just have to convert the docutils dom to a string (which is done
already) and feed the string to convertRaw.
Issues:
1. I haven't checked this code with different encodings. I believe the
docutils string is always utf8 just before it prints out, so that should
not be a problem.
2. The code outputs to std.out. It probably would be better to have the
conversion saved as a string and then returned.
Paul
===========
#!/usr/bin/python
import os, sys, argparse, io
import xml.sax.handler
from xml.sax.handler import feature_namespaces
from StringIO import StringIO
from xml.sax import InputSource
class InvaidXml(Exception):
pass
class convertRaw(xml.sax.ContentHandler):
def __init__(self, mathml=False, raw_xml=True ):
self.__characters = ''
self.__mathml = mathml
self.__raw_xml = raw_xml
self.__write_raw = False
self.__ns_dict = {'http://www.w3.org/XML/1998/namespace': "xml"}
def startDocument(self):
pass
def characters (self, characters):
self.__characters += characters
def startElementNS(self, name, qname, attrs):
self.__write_text()
ns = name[0]
el_name = name[1]
sys.stdout.write('<')
if el_name == 'raw':
if attrs.get((None, 'format')) == 'xml' and self.__raw_xml:
self.__write_raw = True
if ns:
sys.stdout.write('ns1:%s' % el_name)
else:
sys.stdout.write(el_name)
if ns:
sys.stdout.write(' xmlns:ns1="%s"' % ns)
the_keys = list(attrs.keys())
counter = 1
for the_key in the_keys:
counter +=1
ns_att = the_key[0]
att_name = the_key[1]
value = attrs[the_key]
ns_prefix = self.__ns_dict.get(ns_att)
if ns_att and not ns_prefix:
raise InvaidXml('No namespace for "%s"\n' % (ns_att))
if ns_att and ns_prefix == 'xml':
sys.stdout.write(' xml:%s="%s"' % (att_name, value))
elif ns_att:
raise InvaidXml('Sorry, but don\'t know what to do with
ns "%s"\n' % (ns_prefix))
else:
sys.stdout.write(' %s="%s"' % (att_name, value))
sys.stdout.write('>')
def __write_text(self, raw = False):
if raw:
text = self.__characters
else:
text = xml.sax.saxutils.escape(self.__characters)
sys.stdout.write(text)
self.__characters = ''
def endElementNS(self, name, qname):
ns = name[0]
el_name = name[1]
if (el_name == 'math_block' and self.__mathml) or (el_name ==
'math' and self.__mathml) :
pass
# not implemented yet (thought I have code to implement this)
elif el_name == 'raw' and self.__write_raw:
self.__write_text(raw = True)
self.__write_raw = False
else:
self.__write_text()
if ns:
raise InvaidXml('Should not be namespace "%s" here\n' % (ns))
else:
sys.stdout.write('</%s>' % el_name)
if __name__ == '__main__':
the_string = """
<document source="test.rst"><paragraph>text</paragraph><raw format="xml"
xml:space="preserve"><ns:root
xmlns:ns="http://www.ns.org"/></raw></document>
"""
read_obj = StringIO(the_string)
the_handle=convertRaw()
parser = xml.sax.make_parser()
parser.setFeature(feature_namespaces, 1)
parser.setContentHandler(the_handle)
parser.setFeature("http://xml.org/sax/features/external-general-entities",
True)
try:
parser.parse(read_obj)
except xml.sax._exceptions.SAXParseException as error:
msg = error.args[0]
raise InvaidXml(msg)
except InvaidXml as error:
msg = error.args[0]
raise InvaidXml(msg)
read_obj.close()
|