From: Paul Tremblay <paulhtremblay@gm...>  20111023 22:04:15

On 10/22/11 9:43 AM, Guenter Milde wrote: > On 20111020, Paul Tremblay wrote: >> On 10/20/11 6:14 AM, Guenter Milde wrote: >>>>>>>> MathML is pretty essential for XML. Can it be put in the XML >>>>>>>> writer? > ... > >>> The idea would be to define a new transform >>> (transforms.math.latex2mathml, in a file >>> docutils/docutils/transforms/math.py say) that would >>> replace the content of math and mathblock nodes. >>> The code would be a mixture of examples from other transforms and the >>> visit_math() method in the html writer. (to avoid duplicating code, >>> once it is in place and tested, the html writer should be modified to >>> use it as well) >> Following your directions, I created math.py in the docutils/transform >> directory. To the __init__ .py in writers, I added: >> from docutils.transforms import math > ... > >> def get_transforms(self): >> return Component.get_transforms(self) + [ >> universal.Messages, >> universal.FilterMessages, >> universal.StripClassesAndElements, >> math.Math_Block, > this should go into /docutils/writers/docutils_xml.py, otherwise it > affects all writers (inheriting from docutils.writers.Writer). > >> math.py looks like this; >> """ >> math used by writers > I'd use something like """math related transformations""" as docstring > for the transforms.math module. > >> from docutils import writers >> from docutils.transforms import writers >> """ > What is this for? > >> __docformat__ = 'reStructuredText' >> from docutils import nodes, utils, languages >> from docutils.transforms import Transform >> from docutils.math.latex2mathml import parse_latex_math >> class Math_Block(Transform): > Do we need separate classes for Math_Block vs. Math_Role or could these be > put into one class? > > Considering that `transforms.math` might be used for several mathrelated > transforms (equation numbering comes to my mind), I'd use a more telling > name, `LaTeXmath2MathML`, say. > >> """ >> Change the text in the math_block from plain text in LaTeX to >> a MathML tree >> """ >> default_priority = 910 # not sure if this needs to be loaded >> earlier or not >> def apply(self): >> for math_block in self.document.traverse(nodes.math_block): >> math_code = math_block.astext() >> mathml_tree = parse_latex_math(math_code, inline=False) >> # need to append the mathml_tree to math_block >> I have a few questions. >> (1) How do you get just the text from a node.Element? In my code, the >> math_block.astext actually returns a text representation of the node, >> including the elements tags, etc. I looked everywhere in >> docutils/nodes.py for a method to get just text, but could not find one. >> Somehow, feeding the string with the tags to parse_latex_math worked >> anyway (following the example in the html writer). > Strange. How can I reproduce this? > > I did a small test inserting > > print node.astext().encode('utf8') > > in the visit_math_block() method of the html writer and did get just the > content, no tags. > >> (2) How do I append the resulting tee to the math_block? I tried >> math_block.append() and other methods, but it seems the latext2mathml.py >> returns a different type of tree then that already created. > I think so. Remember that latext2mathml is taken from a usercontributed > addon in the sandbox and is only intended to produce an MathML > representation to put into HTML pages. > >> I could convert the mathml tree to an XML string and then create a tree >> from that, and then append the tree? I'm just not sure how to do this. > I see several ways forward from here: > > * your proposal (convert to string and parse this to a compatible tree). > Is there a XML parser in the minidom module? > > * modify latex2mathml to use "compatible" tree nodes based on Docutils' > nodes. > >> (3) How do I make this transformation optional, depending on an options >> by the user. The user might have put asciimath in the math_block >> element, in which case it should not be transformed by the >> latex2mathml.py modulel. > Here, you can look at examples for customizable transforms. E.g. the > sectnum_xform setting is defined in frontend.py and works on the > SectNum(Transform) in transforms/parts.py. > > Günter > > > Okay, I've followed all of your suggestions. docutils/writers/docutils.xml now has the following changes: from docutils.transforms import math ^^^^^^^^^^^^ import math class Writer(writers.Writer, Component): ^^^^^ subclassing Component in order to add the transformation supported = ('xml',) """Formats this writer supports.""" settings_spec = ( '"Docutils XML" Writer Options', 'Warning: the newlines and indents options may adversely affect ' 'whitespace; use them only for reading convenience.', (('Generate XML with newlines before and after tags.', ['newlines'], {'action': 'store_true', 'validator': frontend.validate_boolean}), ('Generate XML with indents and newlines.', ['indents'], {'action': 'store_true', 'validator': frontend.validate_boolean}), ('Omit the XML declaration. Use with caution.', ['noxmldeclaration'], {'dest': 'xml_declaration', 'default': 1, 'action': 'store_false', 'validator': frontend.validate_boolean}), ('Omit the DOCTYPE declaration.', ['nodoctype'], {'dest': 'doctype_declaration', 'default': 1, 'action': 'store_false', 'validator': frontend.validate_boolean}), ('Convert LaTeX math in math_block and math to MathML', ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Add an option for latexmathml ['latexmathml'], {'dest': 'latex_mathml', 'default':False, 'action': 'store_true', 'validator': frontend.validate_boolean}), ('Convert ASCII math in math_block and math to MathML', ^^^^^^^^^^^^^^^^^^^ Add an option for ASCII math ['asciimathml'], {'dest': 'ascii_mathml', 'default':False, 'action': 'store_true', 'validator': frontend.validate_boolean}), )) def get_transforms(self): return Component.get_transforms(self) + [ universal.Messages, universal.FilterMessages, universal.StripClassesAndElements, math.LaTeXmath2MathML, math.Asciimath2MathML, ^^^^^^^^^^^^^^^^ add 2 new writers ] ============================================= The file docutils/transforms/math.py, looks like this: # $Id: writer_aux.py 6433 20100928 08:21:25Z milde $ # Author: Lea Wiemann <LeWiemann@...> # Copyright: This module has been placed in the public domain. """ math used by writers """ __docformat__ = 'reStructuredText' from docutils import nodes, utils, languages from docutils.transforms import Transform from docutils.math.latex2mathml import parse_latex_math from xml.dom.minidom import parse, parseString, Node import sys class LaTeXmath2MathML(Transform): """ Change the text in the math_block and math from plain text in LaTeX to a MathML tree """ default_priority = 910 # not sure if this needs to be loaded earlier or not def apply(self): latex_mathml = self.document.settings.latex_mathml if not latex_mathml: return for math_block in self.document.traverse(nodes.math_block): math_code = math_block.astext() try: mathml_tree = parse_latex_math(math_code, inline=False) math_xml = ''.join(mathml_tree.xml()) except SyntaxError, err: err_node = self.document.reporter.error(err, base_node=math_block) math_block.append(err_node) return new_math_block = nodes.Element(rawsource=math_code) new_math_block.tagname = 'math_block' math_block.replace_self(new_math_block) convert_string_to_docutils_tree(math_xml, new_math_block) for math in self.document.traverse(nodes.math): math_code = math.astext() try: mathml_tree = parse_latex_math(math_code, inline=True) math_xml = ''.join(mathml_tree.xml()) except SyntaxError, err: err_node = self.document.reporter.error(err, base_node=math) math.append(err_node) return new_math = nodes.Element(rawsource=math_code) new_math.tagname = 'math' math.replace_self(new_math) convert_string_to_docutils_tree(math_xml, new_math) class Asciimath2MathML(Transform): """ Change the text in the math_block and math from plain text in ASCII to a MathML tree """ default_priority = 910 # not sure if this needs to be loaded earlier or not def apply(self): ascii_mathml = self.document.settings.ascii_mathml if not ascii_mathml: return try: import asciimathml from xml.etree.ElementTree import Element, tostring except ImportError as msg: err_node = self.document.reporter.error(msg, base_node=self.document) return for math_block in self.document.traverse(nodes.math_block): math_code = math_block.astext() math_tree = asciimathml.parse(math_code) math_tree.set('xmlns' ,'http://www.w3.org/1998/Math/MathML';) math_xml = tostring(math_tree, encoding="utf8") math_xml = math_xml.decode('utf8') new_math_block = nodes.Element(rawsource=math_code) new_math_block.tagname = 'math_block' math_block.replace_self(new_math_block) convert_string_to_docutils_tree(math_xml, new_math_block) for math in self.document.traverse(nodes.math): math_code = math.astext() math_tree = asciimathml.parse(math_code) math_tree.set('xmlns' ,'http://www.w3.org/1998/Math/MathML';) math_xml = tostring(math_tree, encoding="utf8") math_xml = math_xml.decode('utf8') new_math = nodes.Element(rawsource=math_code) new_math.tagname = 'math' math.replace_self(new_math) convert_string_to_docutils_tree(math_xml, new_math) def convert_string_to_docutils_tree(xml_string, docutils_node): minidom_dom = parseString(xml_string.encode('utf8')) _convert_tree(minidom_dom, docutils_node) def _convert_tree(minidom_node, docutils_node): for child_node in minidom_node.childNodes: if child_node.nodeType == Node.ELEMENT_NODE: tag_name = child_node.nodeName node_text = '' for grand_child in child_node.childNodes: if grand_child.nodeType == Node.TEXT_NODE: node_text += grand_child.nodeValue if node_text.strip() != '': Element = nodes.TextElement(text=node_text) else: Element = nodes.Element() Element.tagname = tag_name attrs = child_node.attributes if attrs: for attrName in attrs.keys(): attrNode = attrs.get(attrName) attrValue = attrNode.nodeValue attr_string_name = attrNode.nodeName Element[attr_string_name] = attrValue docutils_node.append(Element) if len(child_node.childNodes) != 0: _convert_tree(child_node, Element) ============================== I've done simple tests with the math.txt in test/functional/input/data/math.txt, as well as with my own math_ascii.rst file, and the code seems to work. It obviously needs some documentation. Also, there is apparently a bug with minidom when using python 3. I could write another simple function to supplement _convert_tree(minidom_node, docutils_node):, except use the xml.etree module, which is considered more uptodate than minidom, but which does not work with python older than 2.5. Paul 