From: Paul T. <pau...@gm...> - 2011-10-24 20:21:02
|
Can someone point me to how to write elements that mix elements and text in a docutils tree, using docutils/nodes.py? For example, take t his fragment: <paragraph>text <emphasis>word</emphasis> text</paragraph> The nodes.Element class can create a node with no text, and the nodes.TextElement can add nodes with text, but I can't find the method for just adding text to an element. from docutils import nodes >>> from docutils import nodes >>> element = nodes.Element() >>> element.tagname = 'p' >>> print(element) <p/> element2 = nodes.TextElement(text='word') >>> element2.tagname = 'emphasis' >>> element.append(element2) >>> print element <p><emphasis>word</emphasis></p> What I need to do is first append the text to element, then the element2, then more text. I can see the method for doing this in minidom, but not in nodes. In case you are wondering why I need to do this, consider that for the math.py patch, I need to convert an XML string to a docutils tree. The code I have already suggested works, but only because MathML does not mix text and elements. I would like to write a more complete function that could convert any string to a docutils node. Thanks Paul |
From: David G. <go...@py...> - 2011-10-24 20:41:37
|
On Mon, Oct 24, 2011 at 16:20, Paul Tremblay <pau...@gm...> wrote: > Can someone point me to how to write elements that mix elements and text > in a docutils tree, using docutils/nodes.py? You're doing it wrong. >>> p = nodes.paragraph(text='Some initial text, followed by ') >>> p.append(nodes.emphasis(text='some emphasized text')) >>> p.append(nodes.Text('.')) >>> print p <paragraph>Some initial text, followed by <emphasis>some emphasized text</emphasis>.</paragraph> >>> print p.pformat() <paragraph> Some initial text, followed by <emphasis> some emphasized text . Note that text-containing elements (like paragraph or emphasis) require ``text=`` (because it's the second parameter; the first is rawsource). But Text (used for text without a surrounding element) doesn't use ``text=``. The source and the tests (e.g. docutils/test/test_nodes.py) are good sources of "how-to" info. Use the source, Luke! -- David Goodger <http://python.net/~goodger> |
From: Paul T. <pau...@gm...> - 2011-10-24 21:38:14
|
On 10/24/11 4:41 PM, David Goodger wrote: > On Mon, Oct 24, 2011 at 16:20, Paul Tremblay<pau...@gm...> wrote: >> Can someone point me to how to write elements that mix elements and text >> in a docutils tree, using docutils/nodes.py? > You're doing it wrong. > >>>> p = nodes.paragraph(text='Some initial text, followed by ') >>>> p.append(nodes.emphasis(text='some emphasized text')) >>>> p.append(nodes.Text('.')) >>>> print p > <paragraph>Some initial text, followed by<emphasis>some emphasized > text</emphasis>.</paragraph> >>>> print p.pformat() > <paragraph> > Some initial text, followed by > <emphasis> > some emphasized text > . > > Note that text-containing elements (like paragraph or emphasis) > require ``text=`` (because it's the second parameter; the first is > rawsource). But Text (used for text without a surrounding element) > doesn't use ``text=``. > > The source and the tests (e.g. docutils/test/test_nodes.py) are good > sources of "how-to" info. > Use the source, Luke! Thank you Obi-Wan Kenobi! Yes, nodes.Text() was what I was looking for However, since I am adding elements that are not part of the normal docutils tree (that is, MathML elements, such as <mrow>), I assume I still have to use:: e = nodes.Element() e.tagname='mrow' ?? (I did look in docutils/test/test_nodes.py, as well as nodes.py, of course.) Luke (AKA Paul) > |
From: David G. <go...@py...> - 2011-10-25 02:04:41
|
On Mon, Oct 24, 2011 at 17:38, Paul Tremblay <pau...@gm...> wrote: > However, since I am adding elements that are not part of the normal docutils > tree (that is, MathML elements, such as <mrow>), I assume I still have to > use:: > > e = nodes.Element() > e.tagname='mrow' > > ?? > > (I did look in docutils/test/test_nodes.py, as well as nodes.py, of course.) You *could* do it that way, but I wouldn't. I would make subclasses of element classes from docutils.nodes. The tagname attribute is taken automatically from the class name, and you can have custom functionality (see docutils.nodes for examples). -- David Goodger <http://python.net/~goodger> |
From: Paul T. <pau...@gm...> - 2011-10-25 03:22:33
|
On 10/24/11 10:04 PM, David Goodger wrote: > On Mon, Oct 24, 2011 at 17:38, Paul Tremblay<pau...@gm...> wrote: >> However, since I am adding elements that are not part of the normal docutils >> tree (that is, MathML elements, such as<mrow>), I assume I still have to >> use:: >> >> e = nodes.Element() >> e.tagname='mrow' >> >> ?? >> >> (I did look in docutils/test/test_nodes.py, as well as nodes.py, of course.) > You *could* do it that way, but I wouldn't. I would make subclasses of > element classes from docutils.nodes. The tagname attribute is taken > automatically from the class name, and you can have custom > functionality (see docutils.nodes for examples). > I see. So to add a "mi" element: class mi(nodes.TextElement): pass element= mi(text = '5.66') The problem is I am trying to come up with a general function to convert any XML string to docutils node, so I don't always know what elements I will encounter. For example, .. raw:: xml <custonElement1> <customElement2> Produces elements of unknown name, so I cannot create an Element class for each. Paul |
From: Guenter M. <mi...@us...> - 2011-11-01 09:56:40
|
On 2011-10-25, Paul Tremblay wrote: > On 10/24/11 10:04 PM, David Goodger wrote: >> On Mon, Oct 24, 2011 at 17:38, Paul Tremblay<pau...@gm...> wrote: ... > The problem is I am trying to come up with a general function to convert > any XML string to docutils node, so I don't always know what elements I > will encounter. For example, > .. raw:: xml > <custonElement1> > <customElement2> > Produces elements of unknown name, so I cannot create an Element class > for each. However, this conversion is only required because the current XML writer does not support raw content:: <raw format="xml" xml:space="preserve"> <custonElement1> <customElement2> </raw> I'd prefer a more consistent way to deal with the problem: add this capability to the XML writer. I know this means rather big changes to the current XML writer, however * this would bring it in a line with the other docutils writers, * a lot of required code could be taken from other writers. This would also solve the "math" problem (without the need to convert an XML string to a doctree and back to an XML string): Either the "math transform" would generate raw XML (if so required) or the XML writer be made to handle "math" nodes (convert to MathML and insert as raw). I'd like wait for Davids opinion on this issue before the actual implementation. |
From: Paul T. <pau...@gm...> - 2011-11-01 16:42:07
|
On Tue, Nov 1, 2011 at 5:56 AM, Guenter Milde <mi...@us...>wrote: > On 2011-10-25, Paul Tremblay wrote: > > On 10/24/11 10:04 PM, David Goodger wrote: > >> On Mon, Oct 24, 2011 at 17:38, Paul Tremblay<pau...@gm...> > wrote: > > ... > > > > However, this conversion is only required because the current XML writer > does not support raw content:: > > <raw format="xml" xml:space="preserve"> > <custonElement1> > <customElement2> > </raw> > > I'd prefer a more consistent way to deal with the problem: add this > capability to the XML writer. I know this means rather big changes to the > current XML writer, however > > * this would bring it in a line with the other docutils writers, > * a lot of required code could be taken from other writers. > Completely agree! > > This would also solve the "math" problem (without the need to convert an > XML string to a doctree and back to an XML string): Either the "math > transform" would generate raw XML (if so required) or the XML writer be > made to handle "math" nodes (convert to MathML and insert as raw). > > > It seems better to have the math transform do the conversion, since the conversion a string to XML, something more appropriate for a transform. As far as requiring a lot more code in the writer, I'm not so sure that is the case. I'm at work right now so I can't test this out, but I think you can simply iterate through the DOM, and print each element. When a math node is found, unescape this with the SAX.utils. unescpape function, and then print out that string. I'll have to see if this will work at home. If so, it would require very little additional code. I like your suggestion because it kills two birds with one stone--allowing for raw XML (my next suggested patch) and mathML, without the waste of conversion to a node and then to a string. Paul |
From: Guenter M. <mi...@us...> - 2011-12-16 20:20:28
|
Dear Paul, On 2011-11-01, Paul Tremblay wrote: > On Tue, Nov 1, 2011 at 5:56 AM, Guenter Milde <mi...@us...>wrote: >> However, this conversion is only required because the current XML writer >> does not support raw content:: >> <raw format="xml" xml:space="preserve"> >> <custonElement1> >> <customElement2> >> </raw> >> I'd prefer a more consistent way to deal with the problem: add this >> capability to the XML writer. I know this means rather big changes to the >> current XML writer, however >> * this would bring it in a line with the other docutils writers, >> * a lot of required code could be taken from other writers. ... > As far as requiring a lot more code in the writer, I'm not so sure that is > the case. I'm at work right now so I can't test this out, but I think you > can simply iterate through the DOM, and print each element. When a math > node is found, unescape this with the SAX.utils. unescpape function, and > then print out that string. I'll have to see if this will work at home. If > so, it would require very little additional code. > I like your suggestion because it kills two birds with one stone--allowing > for raw XML (my next suggested patch) and mathML, without the waste of > conversion to a node and then to a string. How far did you come with this approach? Günter |
From: Paul T. <pau...@gm...> - 2011-12-16 21:25:19
|
Guenter, I assumed from subsequent emails that docutils did not want this feature. I would have to go back and see show far I got, but if I remember correctly, this idea was very easy to implement and I had more or less completed it. Paul On Fri, Dec 16, 2011 at 3:16 PM, Guenter Milde <mi...@us...> wrote: > Dear Paul, > > On 2011-11-01, Paul Tremblay wrote: > > On Tue, Nov 1, 2011 at 5:56 AM, Guenter Milde <mi...@us... > >wrote: > > > >> However, this conversion is only required because the current XML writer > >> does not support raw content:: > > >> <raw format="xml" xml:space="preserve"> > >> <custonElement1> > >> <customElement2> > >> </raw> > > >> I'd prefer a more consistent way to deal with the problem: add this > >> capability to the XML writer. I know this means rather big changes to > the > >> current XML writer, however > > >> * this would bring it in a line with the other docutils writers, > >> * a lot of required code could be taken from other writers. > > ... > > > As far as requiring a lot more code in the writer, I'm not so sure that > is > > the case. I'm at work right now so I can't test this out, but I think you > > can simply iterate through the DOM, and print each element. When a math > > node is found, unescape this with the SAX.utils. unescpape function, and > > then print out that string. I'll have to see if this will work at home. > If > > so, it would require very little additional code. > > > I like your suggestion because it kills two birds with one > stone--allowing > > for raw XML (my next suggested patch) and mathML, without the waste of > > conversion to a node and then to a string. > > How far did you come with this approach? > > Günter > > > > ------------------------------------------------------------------------------ > Learn Windows Azure Live! Tuesday, Dec 13, 2011 > Microsoft is holding a special Learn Windows Azure training event for > developers. It will provide a great way to learn Windows Azure and what it > provides. You can attend the event by watching it streamed LIVE online. > Learn more at http://p.sf.net/sfu/ms-windowsazure > _______________________________________________ > Docutils-develop mailing list > Doc...@li... > https://lists.sourceforge.net/lists/listinfo/docutils-develop > > Please use "Reply All" to reply to the list. > |
From: Guenter M. <mi...@us...> - 2011-12-17 09:37:20
|
Dear Paul, On 2011-12-16, Paul Tremblay wrote: >> > I like your suggestion because it kills two birds with one >> > stone--allowing for raw XML (my next suggested patch) and mathML, >> > without the waste of conversion to a node and then to a string. ... > I assumed from subsequent emails that docutils did not want this feature. I > would have to go back and see show far I got, but if I remember correctly, > this idea was very easy to implement and I had more or less completed it. There are two parts that need separate consideration: I would like to see "raw XML" properly implemented/supported in Docutils and I am ready to help to achieve this. MathML input for math is more tricky: to get it into the core, all output formats should be supported, that means we need a MathML->LaTeX converter. Therefore I suggested to start this feature as a sandbox project. Sorry for the confusion. Günter |
From: Paul T. <pau...@gm...> - 2011-12-17 17:51:17
|
On 12/17/11 4:36 AM, Guenter Milde wrote: > Dear Paul, > > On 2011-12-16, Paul Tremblay wrote: > >>>> I like your suggestion because it kills two birds with one >>>> stone--allowing for raw XML (my next suggested patch) and mathML, >>>> without the waste of conversion to a node and then to a string. > ... > >> I assumed from subsequent emails that docutils did not want this feature. I >> would have to go back and see show far I got, but if I remember correctly, >> this idea was very easy to implement and I had more or less completed it. > There are two parts that need separate consideration: > > I would like to see "raw XML" properly implemented/supported in Docutils and > I am ready to help to achieve this. > > MathML input for math is more tricky: to get it into the core, all output > formats should be supported, that means we need a MathML->LaTeX converter. > Therefore I suggested to start this feature as a sandbox project. > > Sorry for the confusion. Hi Guenter, Here is some code that converts a docutils string to raw xml. See the code at the very bottom for what the string looks like. I generated the string with rst2xml.py. I'm a bit rusty on how to implement things is docutils, but if I am remembering correctly, there should be much additional code needed. We just have to convert the docutils dom to a string (which is done already) and feed the string to convertRaw. Issues: 1. I haven't checked this code with different encodings. I believe the docutils string is always utf8 just before it prints out, so that should not be a problem. 2. The code outputs to std.out. It probably would be better to have the conversion saved as a string and then returned. Paul =========== #!/usr/bin/python import os, sys, argparse, io import xml.sax.handler from xml.sax.handler import feature_namespaces from StringIO import StringIO from xml.sax import InputSource class InvaidXml(Exception): pass class convertRaw(xml.sax.ContentHandler): def __init__(self, mathml=False, raw_xml=True ): self.__characters = '' self.__mathml = mathml self.__raw_xml = raw_xml self.__write_raw = False self.__ns_dict = {'http://www.w3.org/XML/1998/namespace': "xml"} def startDocument(self): pass def characters (self, characters): self.__characters += characters def startElementNS(self, name, qname, attrs): self.__write_text() ns = name[0] el_name = name[1] sys.stdout.write('<') if el_name == 'raw': if attrs.get((None, 'format')) == 'xml' and self.__raw_xml: self.__write_raw = True if ns: sys.stdout.write('ns1:%s' % el_name) else: sys.stdout.write(el_name) if ns: sys.stdout.write(' xmlns:ns1="%s"' % ns) the_keys = list(attrs.keys()) counter = 1 for the_key in the_keys: counter +=1 ns_att = the_key[0] att_name = the_key[1] value = attrs[the_key] ns_prefix = self.__ns_dict.get(ns_att) if ns_att and not ns_prefix: raise InvaidXml('No namespace for "%s"\n' % (ns_att)) if ns_att and ns_prefix == 'xml': sys.stdout.write(' xml:%s="%s"' % (att_name, value)) elif ns_att: raise InvaidXml('Sorry, but don\'t know what to do with ns "%s"\n' % (ns_prefix)) else: sys.stdout.write(' %s="%s"' % (att_name, value)) sys.stdout.write('>') def __write_text(self, raw = False): if raw: text = self.__characters else: text = xml.sax.saxutils.escape(self.__characters) sys.stdout.write(text) self.__characters = '' def endElementNS(self, name, qname): ns = name[0] el_name = name[1] if (el_name == 'math_block' and self.__mathml) or (el_name == 'math' and self.__mathml) : pass # not implemented yet (thought I have code to implement this) elif el_name == 'raw' and self.__write_raw: self.__write_text(raw = True) self.__write_raw = False else: self.__write_text() if ns: raise InvaidXml('Should not be namespace "%s" here\n' % (ns)) else: sys.stdout.write('</%s>' % el_name) if __name__ == '__main__': the_string = """ <document source="test.rst"><paragraph>text</paragraph><raw format="xml" xml:space="preserve"><ns:root xmlns:ns="http://www.ns.org"/></raw></document> """ read_obj = StringIO(the_string) the_handle=convertRaw() parser = xml.sax.make_parser() parser.setFeature(feature_namespaces, 1) parser.setContentHandler(the_handle) parser.setFeature("http://xml.org/sax/features/external-general-entities", True) try: parser.parse(read_obj) except xml.sax._exceptions.SAXParseException as error: msg = error.args[0] raise InvaidXml(msg) except InvaidXml as error: msg = error.args[0] raise InvaidXml(msg) read_obj.close() |