Thread: [Docutils-develop] how to write mixed elements with nodes.py

Brought to you by: goodger, grubert, milde, tibs, wiemann

docutils-develop

[Docutils-develop] how to write mixed elements with nodes.py

From: Paul T. <pau...@gm...> - 2011-10-24 20:21:02

Can someone point me to how to write elements that mix elements and text 
in a docutils tree, using docutils/nodes.py? For example, take t his 
fragment:

<paragraph>text <emphasis>word</emphasis> text</paragraph>

The nodes.Element class can create a node with no text, and the 
nodes.TextElement can add nodes with text, but I can't find the method 
for just adding text to an element.

from docutils import nodes

 >>> from docutils import nodes
 >>> element = nodes.Element()
 >>> element.tagname = 'p'
 >>> print(element)
<p/>
element2 = nodes.TextElement(text='word')
 >>> element2.tagname = 'emphasis'
 >>> element.append(element2)
 >>> print element
<p><emphasis>word</emphasis></p>

What I need to do is first append the text to element, then the 
element2, then more text. I can see the method for doing this in 
minidom, but not in nodes.

In case you are wondering why I need to do this, consider that for the 
math.py patch, I need to convert an XML string to a docutils tree. The 
code I have already suggested works, but only because MathML does not 
mix text and elements. I would like to write a more complete function 
that could convert any string to a docutils node.

Thanks

Paul

Re: [Docutils-develop] how to write mixed elements with nodes.py

From: David G. <go...@py...> - 2011-10-24 20:41:37

On Mon, Oct 24, 2011 at 16:20, Paul Tremblay <pau...@gm...> wrote:
> Can someone point me to how to write elements that mix elements and text
> in a docutils tree, using docutils/nodes.py?

You're doing it wrong.

>>> p = nodes.paragraph(text='Some initial text, followed by ')
>>> p.append(nodes.emphasis(text='some emphasized text'))
>>> p.append(nodes.Text('.'))
>>> print p
<paragraph>Some initial text, followed by <emphasis>some emphasized
text</emphasis>.</paragraph>
>>> print p.pformat()
<paragraph>
    Some initial text, followed by
    <emphasis>
        some emphasized text
    .

Note that text-containing elements (like paragraph or emphasis)
require ``text=`` (because it's the second parameter; the first is
rawsource). But Text (used for text without a surrounding element)
doesn't use ``text=``.

The source and the tests (e.g. docutils/test/test_nodes.py) are good
sources of "how-to" info.
Use the source, Luke!

-- 
David Goodger <http://python.net/~goodger>

Re: [Docutils-develop] how to write mixed elements with nodes.py

From: Paul T. <pau...@gm...> - 2011-10-24 21:38:14

On 10/24/11 4:41 PM, David Goodger wrote:
> On Mon, Oct 24, 2011 at 16:20, Paul Tremblay<pau...@gm...>  wrote:
>> Can someone point me to how to write elements that mix elements and text
>> in a docutils tree, using docutils/nodes.py?
> You're doing it wrong.
>
>>>> p = nodes.paragraph(text='Some initial text, followed by ')
>>>> p.append(nodes.emphasis(text='some emphasized text'))
>>>> p.append(nodes.Text('.'))
>>>> print p
> <paragraph>Some initial text, followed by<emphasis>some emphasized
> text</emphasis>.</paragraph>
>>>> print p.pformat()
> <paragraph>
>      Some initial text, followed by
>      <emphasis>
>          some emphasized text
>      .
>
> Note that text-containing elements (like paragraph or emphasis)
> require ``text=`` (because it's the second parameter; the first is
> rawsource). But Text (used for text without a surrounding element)
> doesn't use ``text=``.
>
> The source and the tests (e.g. docutils/test/test_nodes.py) are good
> sources of "how-to" info.
> Use the source, Luke!

Thank you Obi-Wan Kenobi! Yes, nodes.Text() was what I was looking for

However, since I am adding elements that are not part of the normal 
docutils tree (that is, MathML elements, such as <mrow>), I assume I 
still have to use::

e = nodes.Element()
e.tagname='mrow'

??

(I did look in docutils/test/test_nodes.py, as well as nodes.py, of course.)

Luke (AKA Paul)
>

Re: [Docutils-develop] how to write mixed elements with nodes.py

From: David G. <go...@py...> - 2011-10-25 02:04:41

On Mon, Oct 24, 2011 at 17:38, Paul Tremblay <pau...@gm...> wrote:
> However, since I am adding elements that are not part of the normal docutils
> tree (that is, MathML elements, such as <mrow>), I assume I still have to
> use::
>
> e = nodes.Element()
> e.tagname='mrow'
>
> ??
>
> (I did look in docutils/test/test_nodes.py, as well as nodes.py, of course.)

You *could* do it that way, but I wouldn't. I would make subclasses of
element classes from docutils.nodes. The tagname attribute is taken
automatically from the class name, and you can have custom
functionality (see docutils.nodes for examples).

-- 
David Goodger <http://python.net/~goodger>

Re: [Docutils-develop] how to write mixed elements with nodes.py

From: Paul T. <pau...@gm...> - 2011-10-25 03:22:33

On 10/24/11 10:04 PM, David Goodger wrote:
> On Mon, Oct 24, 2011 at 17:38, Paul Tremblay<pau...@gm...>  wrote:
>> However, since I am adding elements that are not part of the normal docutils
>> tree (that is, MathML elements, such as<mrow>), I assume I still have to
>> use::
>>
>> e = nodes.Element()
>> e.tagname='mrow'
>>
>> ??
>>
>> (I did look in docutils/test/test_nodes.py, as well as nodes.py, of course.)
> You *could* do it that way, but I wouldn't. I would make subclasses of
> element classes from docutils.nodes. The tagname attribute is taken
> automatically from the class name, and you can have custom
> functionality (see docutils.nodes for examples).
>
I see. So to add a "mi" element:

class mi(nodes.TextElement): pass

element=  mi(text = '5.66')

The problem is I am trying to come up with a general function to convert 
any XML string to docutils node, so I don't always know what elements I 
will encounter. For example,

.. raw:: xml

    <custonElement1>
     <customElement2>



Produces elements of unknown name, so I cannot create an Element class 
for each.

Paul

Re: [Docutils-develop] how to write mixed elements with nodes.py

From: Guenter M. <mi...@us...> - 2011-11-01 09:56:40

On 2011-10-25, Paul Tremblay wrote:
> On 10/24/11 10:04 PM, David Goodger wrote:
>> On Mon, Oct 24, 2011 at 17:38, Paul Tremblay<pau...@gm...>  wrote:

...

> The problem is I am trying to come up with a general function to convert 
> any XML string to docutils node, so I don't always know what elements I 
> will encounter. For example,

> .. raw:: xml

>     <custonElement1>
>      <customElement2>

> Produces elements of unknown name, so I cannot create an Element class 
> for each.

However, this conversion is only required because the current XML writer
does not support raw content::

    <raw format="xml" xml:space="preserve">
        &lt;custonElement1&gt;
 &lt;customElement2&gt;
    </raw>

I'd prefer a more consistent way to deal with the problem: add this
capability to the XML writer. I know this means rather big changes to the 
current XML writer, however 

* this would bring it in a line with the other docutils writers,
* a lot of required code could be taken from other writers.

This would also solve the "math" problem (without the need to convert an
XML string to a doctree and back to an XML string): Either the "math
transform" would generate raw XML (if so required) or the XML writer be
made to handle "math" nodes (convert to MathML and insert as raw).

I'd like wait for Davids opinion on this issue before the actual
implementation.

Re: [Docutils-develop] how to write mixed elements with nodes.py

From: Paul T. <pau...@gm...> - 2011-11-01 16:42:07

On Tue, Nov 1, 2011 at 5:56 AM, Guenter Milde <mi...@us...>wrote:

> On 2011-10-25, Paul Tremblay wrote:
> > On 10/24/11 10:04 PM, David Goodger wrote:
> >> On Mon, Oct 24, 2011 at 17:38, Paul Tremblay<pau...@gm...>
>  wrote:
>
> ...
>
>
>
> However, this conversion is only required because the current XML writer
> does not support raw content::
>
>    <raw format="xml" xml:space="preserve">
>        &lt;custonElement1&gt;
>  &lt;customElement2&gt;
>    </raw>
>
> I'd prefer a more consistent way to deal with the problem: add this
> capability to the XML writer. I know this means rather big changes to the
> current XML writer, however
>
> * this would bring it in a line with the other docutils writers,
> * a lot of required code could be taken from other writers.
>

Completely agree!

>
> This would also solve the "math" problem (without the need to convert an
> XML string to a doctree and back to an XML string): Either the "math
> transform" would generate raw XML (if so required) or the XML writer be
> made to handle "math" nodes (convert to MathML and insert as raw).
>
>
>

It seems  better to have the math transform do the conversion, since the
conversion a string to XML, something more appropriate for a transform.

As far as requiring a lot more code in the writer, I'm not so sure that is
the case. I'm at work right now so I can't test this out, but I think you
can simply iterate through the DOM, and print each element. When a math
node is found, unescape this with the SAX.utils. unescpape function, and
then print out that string. I'll have to see if this will work at home. If
so, it would require very little additional code.

I like your suggestion because it kills two birds with one stone--allowing
for raw XML (my next suggested patch) and mathML, without the waste of
conversion to a node and then to a string.

Paul

Re: [Docutils-develop] how to write mixed elements with nodes.py

From: Guenter M. <mi...@us...> - 2011-12-16 20:20:28

Dear Paul,

On 2011-11-01, Paul Tremblay wrote:
> On Tue, Nov 1, 2011 at 5:56 AM, Guenter Milde <mi...@us...>wrote:


>> However, this conversion is only required because the current XML writer
>> does not support raw content::

>>    <raw format="xml" xml:space="preserve">
>>        &lt;custonElement1&gt;
>>  &lt;customElement2&gt;
>>    </raw>

>> I'd prefer a more consistent way to deal with the problem: add this
>> capability to the XML writer. I know this means rather big changes to the
>> current XML writer, however

>> * this would bring it in a line with the other docutils writers,
>> * a lot of required code could be taken from other writers.

...

> As far as requiring a lot more code in the writer, I'm not so sure that is
> the case. I'm at work right now so I can't test this out, but I think you
> can simply iterate through the DOM, and print each element. When a math
> node is found, unescape this with the SAX.utils. unescpape function, and
> then print out that string. I'll have to see if this will work at home. If
> so, it would require very little additional code.

> I like your suggestion because it kills two birds with one stone--allowing
> for raw XML (my next suggested patch) and mathML, without the waste of
> conversion to a node and then to a string.

How far did you come with this approach?

Günter

Re: [Docutils-develop] how to write mixed elements with nodes.py

From: Paul T. <pau...@gm...> - 2011-12-16 21:25:19

Guenter,

I assumed from subsequent emails that docutils did not want this feature. I
would have to go back and see show far I got, but if I remember correctly,
this idea was very easy to implement and I had more or less completed it.

Paul

On Fri, Dec 16, 2011 at 3:16 PM, Guenter Milde <mi...@us...> wrote:

> Dear Paul,
>
> On 2011-11-01, Paul Tremblay wrote:
> > On Tue, Nov 1, 2011 at 5:56 AM, Guenter Milde <mi...@us...
> >wrote:
>
>
> >> However, this conversion is only required because the current XML writer
> >> does not support raw content::
>
> >>    <raw format="xml" xml:space="preserve">
> >>        &lt;custonElement1&gt;
> >>  &lt;customElement2&gt;
> >>    </raw>
>
> >> I'd prefer a more consistent way to deal with the problem: add this
> >> capability to the XML writer. I know this means rather big changes to
> the
> >> current XML writer, however
>
> >> * this would bring it in a line with the other docutils writers,
> >> * a lot of required code could be taken from other writers.
>
> ...
>
> > As far as requiring a lot more code in the writer, I'm not so sure that
> is
> > the case. I'm at work right now so I can't test this out, but I think you
> > can simply iterate through the DOM, and print each element. When a math
> > node is found, unescape this with the SAX.utils. unescpape function, and
> > then print out that string. I'll have to see if this will work at home.
> If
> > so, it would require very little additional code.
>
> > I like your suggestion because it kills two birds with one
> stone--allowing
> > for raw XML (my next suggested patch) and mathML, without the waste of
> > conversion to a node and then to a string.
>
> How far did you come with this approach?
>
> Günter
>
>
>
> ------------------------------------------------------------------------------
> Learn Windows Azure Live!  Tuesday, Dec 13, 2011
> Microsoft is holding a special Learn Windows Azure training event for
> developers. It will provide a great way to learn Windows Azure and what it
> provides. You can attend the event by watching it streamed LIVE online.
> Learn more at http://p.sf.net/sfu/ms-windowsazure
> _______________________________________________
> Docutils-develop mailing list
> Doc...@li...
> https://lists.sourceforge.net/lists/listinfo/docutils-develop
>
> Please use "Reply All" to reply to the list.
>

Re: [Docutils-develop] how to write mixed elements with nodes.py

From: Guenter M. <mi...@us...> - 2011-12-17 09:37:20

Dear Paul,

On 2011-12-16, Paul Tremblay wrote:

>> > I like your suggestion because it kills two birds with one
>> > stone--allowing for raw XML (my next suggested patch) and mathML,
>> > without the waste of conversion to a node and then to a string.

...

> I assumed from subsequent emails that docutils did not want this feature. I
> would have to go back and see show far I got, but if I remember correctly,
> this idea was very easy to implement and I had more or less completed it.

There are two parts that need separate consideration:

I would like to see "raw XML" properly implemented/supported in Docutils and
I am ready to help to achieve this.

MathML input for math is more tricky: to get it into the core, all output
formats should be supported, that means we need a MathML->LaTeX converter.
Therefore I suggested to start this feature as a sandbox project.

Sorry for the confusion.

Günter

Re: [Docutils-develop] how to write mixed elements with nodes.py

From: Paul T. <pau...@gm...> - 2011-12-17 17:51:17

On 12/17/11 4:36 AM, Guenter Milde wrote:
> Dear Paul,
>
> On 2011-12-16, Paul Tremblay wrote:
>
>>>> I like your suggestion because it kills two birds with one
>>>> stone--allowing for raw XML (my next suggested patch) and mathML,
>>>> without the waste of conversion to a node and then to a string.
> ...
>
>> I assumed from subsequent emails that docutils did not want this feature. I
>> would have to go back and see show far I got, but if I remember correctly,
>> this idea was very easy to implement and I had more or less completed it.
> There are two parts that need separate consideration:
>
> I would like to see "raw XML" properly implemented/supported in Docutils and
> I am ready to help to achieve this.
>
> MathML input for math is more tricky: to get it into the core, all output
> formats should be supported, that means we need a MathML->LaTeX converter.
> Therefore I suggested to start this feature as a sandbox project.
>
> Sorry for the confusion.

Hi Guenter,

Here is some code that converts a docutils string to raw xml. See the 
code at the very bottom for what the string looks like. I generated the 
string with rst2xml.py.

I'm a bit rusty on how to implement things is docutils, but if I am 
remembering correctly, there should be much additional code needed. We 
just have to convert the docutils dom to a string (which is done 
already) and feed the string to convertRaw.

Issues:

1. I haven't checked this code with different encodings. I believe the 
docutils string is always utf8 just before it prints out, so that should 
not be a problem.

2. The code outputs to std.out. It probably would be better to have the 
conversion saved as a string and then returned.

Paul

===========
#!/usr/bin/python

import os, sys, argparse, io
import xml.sax.handler
from xml.sax.handler import feature_namespaces
from StringIO import StringIO


from xml.sax import InputSource

class InvaidXml(Exception):
     pass


class convertRaw(xml.sax.ContentHandler):

   def __init__(self, mathml=False, raw_xml=True ):
         self.__characters = ''
         self.__mathml = mathml
         self.__raw_xml = raw_xml
         self.__write_raw = False
         self.__ns_dict = {'http://www.w3.org/XML/1998/namespace': "xml"}


   def startDocument(self):
       pass


   def characters (self, characters):
     self.__characters += characters


   def startElementNS(self, name, qname, attrs):
         self.__write_text()
         ns = name[0]
         el_name = name[1]
         sys.stdout.write('<')
         if el_name == 'raw':
             if attrs.get((None, 'format')) == 'xml' and self.__raw_xml:
                 self.__write_raw = True
         if ns:
             sys.stdout.write('ns1:%s' % el_name)
         else:
             sys.stdout.write(el_name)
         if ns:
             sys.stdout.write(' xmlns:ns1="%s"' % ns)

         the_keys = list(attrs.keys())
         counter = 1
         for the_key in the_keys:
             counter +=1
             ns_att = the_key[0]
             att_name = the_key[1]
             value = attrs[the_key]
             ns_prefix = self.__ns_dict.get(ns_att)
             if ns_att and not ns_prefix:
                 raise InvaidXml('No namespace for "%s"\n' % (ns_att))
             if ns_att and ns_prefix == 'xml':
                 sys.stdout.write(' xml:%s="%s"' % (att_name, value))
             elif ns_att:
                 raise InvaidXml('Sorry, but don\'t know what to do with 
ns "%s"\n' % (ns_prefix))
             else:
                 sys.stdout.write(' %s="%s"' % (att_name, value))
         sys.stdout.write('>')



   def __write_text(self, raw = False):
         if raw:
             text = self.__characters
         else:
             text =  xml.sax.saxutils.escape(self.__characters)
         sys.stdout.write(text)
         self.__characters = ''

   def endElementNS(self, name, qname):
         ns = name[0]
         el_name = name[1]
         if (el_name == 'math_block' and  self.__mathml) or (el_name == 
'math' and self.__mathml) :
             pass
             # not implemented yet (thought I have code to implement this)
         elif el_name == 'raw' and self.__write_raw:
             self.__write_text(raw = True)
             self.__write_raw = False
         else:
             self.__write_text()
         if ns:
             raise InvaidXml('Should not be namespace "%s" here\n' % (ns))
         else:
             sys.stdout.write('</%s>' % el_name)




if __name__ == '__main__':
     the_string = """
<document source="test.rst"><paragraph>text</paragraph><raw format="xml" 
xml:space="preserve">&lt;ns:root 
xmlns:ns=&quot;http://www.ns.org&quot;/&gt;</raw></document>
     """
     read_obj = StringIO(the_string)
     the_handle=convertRaw()
     parser = xml.sax.make_parser()
     parser.setFeature(feature_namespaces, 1)
     parser.setContentHandler(the_handle)
     
parser.setFeature("http://xml.org/sax/features/external-general-entities", 
True)
     try:
         parser.parse(read_obj)
     except xml.sax._exceptions.SAXParseException as error:
         msg = error.args[0]
         raise InvaidXml(msg)
     except InvaidXml as error:
         msg = error.args[0]
         raise InvaidXml(msg)
     read_obj.close()