On 10/22/11 9:43 AM, Guenter Milde wrote:
> On 20111020, Paul Tremblay wrote:
>> On 10/20/11 6:14 AM, Guenter Milde wrote:
>>>>>>>> MathML is pretty essential for XML. Can it be put in the XML
>>>>>>>> writer?
> ...
>
>>> The idea would be to define a new transform
>>> (transforms.math.latex2mathml, in a file
>>> docutils/docutils/transforms/math.py say) that would
>>> replace the content of math and mathblock nodes.
>>> The code would be a mixture of examples from other transforms and the
>>> visit_math() method in the html writer. (to avoid duplicating code,
>>> once it is in place and tested, the html writer should be modified to
>>> use it as well)
>> Following your directions, I created math.py in the docutils/transform
>> directory. To the __init__ .py in writers, I added:
>> from docutils.transforms import math
> ...
>
>> def get_transforms(self):
>> return Component.get_transforms(self) + [
>> universal.Messages,
>> universal.FilterMessages,
>> universal.StripClassesAndElements,
>> math.Math_Block,
> this should go into /docutils/writers/docutils_xml.py, otherwise it
> affects all writers (inheriting from docutils.writers.Writer).
>
>> math.py looks like this;
>> """
>> math used by writers
> I'd use something like """math related transformations""" as docstring
> for the transforms.math module.
>
>> from docutils import writers
>> from docutils.transforms import writers
>> """
> What is this for?
>
>> __docformat__ = 'reStructuredText'
>> from docutils import nodes, utils, languages
>> from docutils.transforms import Transform
>> from docutils.math.latex2mathml import parse_latex_math
>> class Math_Block(Transform):
> Do we need separate classes for Math_Block vs. Math_Role or could these be
> put into one class?
>
> Considering that `transforms.math` might be used for several mathrelated
> transforms (equation numbering comes to my mind), I'd use a more telling
> name, `LaTeXmath2MathML`, say.
>
>> """
>> Change the text in the math_block from plain text in LaTeX to
>> a MathML tree
>> """
>> default_priority = 910 # not sure if this needs to be loaded
>> earlier or not
>> def apply(self):
>> for math_block in self.document.traverse(nodes.math_block):
>> math_code = math_block.astext()
>> mathml_tree = parse_latex_math(math_code, inline=False)
>> # need to append the mathml_tree to math_block
>> I have a few questions.
>> (1) How do you get just the text from a node.Element? In my code, the
>> math_block.astext actually returns a text representation of the node,
>> including the elements tags, etc. I looked everywhere in
>> docutils/nodes.py for a method to get just text, but could not find one.
>> Somehow, feeding the string with the tags to parse_latex_math worked
>> anyway (following the example in the html writer).
> Strange. How can I reproduce this?
>
> I did a small test inserting
>
> print node.astext().encode('utf8')
>
> in the visit_math_block() method of the html writer and did get just the
> content, no tags.
>
>> (2) How do I append the resulting tee to the math_block? I tried
>> math_block.append() and other methods, but it seems the latext2mathml.py
>> returns a different type of tree then that already created.
> I think so. Remember that latext2mathml is taken from a usercontributed
> addon in the sandbox and is only intended to produce an MathML
> representation to put into HTML pages.
>
>> I could convert the mathml tree to an XML string and then create a tree
>> from that, and then append the tree? I'm just not sure how to do this.
> I see several ways forward from here:
>
> * your proposal (convert to string and parse this to a compatible tree).
> Is there a XML parser in the minidom module?
>
> * modify latex2mathml to use "compatible" tree nodes based on Docutils'
> nodes.
>
>> (3) How do I make this transformation optional, depending on an options
>> by the user. The user might have put asciimath in the math_block
>> element, in which case it should not be transformed by the
>> latex2mathml.py modulel.
> Here, you can look at examples for customizable transforms. E.g. the
> sectnum_xform setting is defined in frontend.py and works on the
> SectNum(Transform) in transforms/parts.py.
>
> Günter
>
>
>
Okay, I've followed all of your suggestions.
docutils/writers/docutils.xml now has the following changes:
from docutils.transforms import math
^^^^^^^^^^^^
import math
class Writer(writers.Writer, Component):
^^^^^
subclassing Component in order
to add the transformation
supported = ('xml',)
"""Formats this writer supports."""
settings_spec = (
'"Docutils XML" Writer Options',
'Warning: the newlines and indents options may adversely
affect '
'whitespace; use them only for reading convenience.',
(('Generate XML with newlines before and after tags.',
['newlines'],
{'action': 'store_true', 'validator':
frontend.validate_boolean}),
('Generate XML with indents and newlines.',
['indents'],
{'action': 'store_true', 'validator':
frontend.validate_boolean}),
('Omit the XML declaration. Use with caution.',
['noxmldeclaration'],
{'dest': 'xml_declaration', 'default': 1, 'action':
'store_false',
'validator': frontend.validate_boolean}),
('Omit the DOCTYPE declaration.',
['nodoctype'],
{'dest': 'doctype_declaration', 'default': 1,
'action': 'store_false', 'validator':
frontend.validate_boolean}),
('Convert LaTeX math in math_block and math to MathML',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Add an option for latexmathml
['latexmathml'],
{'dest': 'latex_mathml', 'default':False,
'action': 'store_true', 'validator':
frontend.validate_boolean}),
('Convert ASCII math in math_block and math to MathML',
^^^^^^^^^^^^^^^^^^^
Add an option for ASCII math
['asciimathml'],
{'dest': 'ascii_mathml', 'default':False,
'action': 'store_true', 'validator':
frontend.validate_boolean}),
))
def get_transforms(self):
return Component.get_transforms(self) + [
universal.Messages,
universal.FilterMessages,
universal.StripClassesAndElements,
math.LaTeXmath2MathML,
math.Asciimath2MathML,
^^^^^^^^^^^^^^^^
add 2 new writers
]
=============================================
The file docutils/transforms/math.py, looks like this:
# $Id: writer_aux.py 6433 20100928 08:21:25Z milde $
# Author: Lea Wiemann <LeWiemann@...>
# Copyright: This module has been placed in the public domain.
"""
math used by writers
"""
__docformat__ = 'reStructuredText'
from docutils import nodes, utils, languages
from docutils.transforms import Transform
from docutils.math.latex2mathml import parse_latex_math
from xml.dom.minidom import parse, parseString, Node
import sys
class LaTeXmath2MathML(Transform):
"""
Change the text in the math_block and math from plain text in LaTeX to
a MathML tree
"""
default_priority = 910 # not sure if this needs to be loaded
earlier or not
def apply(self):
latex_mathml = self.document.settings.latex_mathml
if not latex_mathml:
return
for math_block in self.document.traverse(nodes.math_block):
math_code = math_block.astext()
try:
mathml_tree = parse_latex_math(math_code, inline=False)
math_xml = ''.join(mathml_tree.xml())
except SyntaxError, err:
err_node = self.document.reporter.error(err,
base_node=math_block)
math_block.append(err_node)
return
new_math_block = nodes.Element(rawsource=math_code)
new_math_block.tagname = 'math_block'
math_block.replace_self(new_math_block)
convert_string_to_docutils_tree(math_xml, new_math_block)
for math in self.document.traverse(nodes.math):
math_code = math.astext()
try:
mathml_tree = parse_latex_math(math_code, inline=True)
math_xml = ''.join(mathml_tree.xml())
except SyntaxError, err:
err_node = self.document.reporter.error(err,
base_node=math)
math.append(err_node)
return
new_math = nodes.Element(rawsource=math_code)
new_math.tagname = 'math'
math.replace_self(new_math)
convert_string_to_docutils_tree(math_xml, new_math)
class Asciimath2MathML(Transform):
"""
Change the text in the math_block and math from plain text in ASCII to
a MathML tree
"""
default_priority = 910 # not sure if this needs to be loaded
earlier or not
def apply(self):
ascii_mathml = self.document.settings.ascii_mathml
if not ascii_mathml:
return
try:
import asciimathml
from xml.etree.ElementTree import Element, tostring
except ImportError as msg:
err_node = self.document.reporter.error(msg,
base_node=self.document)
return
for math_block in self.document.traverse(nodes.math_block):
math_code = math_block.astext()
math_tree = asciimathml.parse(math_code)
math_tree.set('xmlns' ,'http://www.w3.org/1998/Math/MathML';)
math_xml = tostring(math_tree, encoding="utf8")
math_xml = math_xml.decode('utf8')
new_math_block = nodes.Element(rawsource=math_code)
new_math_block.tagname = 'math_block'
math_block.replace_self(new_math_block)
convert_string_to_docutils_tree(math_xml, new_math_block)
for math in self.document.traverse(nodes.math):
math_code = math.astext()
math_tree = asciimathml.parse(math_code)
math_tree.set('xmlns' ,'http://www.w3.org/1998/Math/MathML';)
math_xml = tostring(math_tree, encoding="utf8")
math_xml = math_xml.decode('utf8')
new_math = nodes.Element(rawsource=math_code)
new_math.tagname = 'math'
math.replace_self(new_math)
convert_string_to_docutils_tree(math_xml, new_math)
def convert_string_to_docutils_tree(xml_string, docutils_node):
minidom_dom = parseString(xml_string.encode('utf8'))
_convert_tree(minidom_dom, docutils_node)
def _convert_tree(minidom_node, docutils_node):
for child_node in minidom_node.childNodes:
if child_node.nodeType == Node.ELEMENT_NODE:
tag_name = child_node.nodeName
node_text = ''
for grand_child in child_node.childNodes:
if grand_child.nodeType == Node.TEXT_NODE:
node_text += grand_child.nodeValue
if node_text.strip() != '':
Element = nodes.TextElement(text=node_text)
else:
Element = nodes.Element()
Element.tagname = tag_name
attrs = child_node.attributes
if attrs:
for attrName in attrs.keys():
attrNode = attrs.get(attrName)
attrValue = attrNode.nodeValue
attr_string_name = attrNode.nodeName
Element[attr_string_name] = attrValue
docutils_node.append(Element)
if len(child_node.childNodes) != 0:
_convert_tree(child_node, Element)
==============================
I've done simple tests with the math.txt in
test/functional/input/data/math.txt, as well as with my own
math_ascii.rst file, and the code seems to work.
It obviously needs some documentation. Also, there is apparently a bug
with minidom when using python 3. I could write another simple function
to supplement _convert_tree(minidom_node, docutils_node):, except use
the xml.etree module, which is considered more uptodate than minidom,
but which does not work with python older than 2.5.
Paul
