## docutils-develop

 Re: [Docutils-develop] Using RST for technical textbook From: Guenter Milde - 2011-10-22 13:43:53 On 2011-10-20, Paul Tremblay wrote: > On 10/20/11 6:14 AM, Guenter Milde wrote: >>>>>>> MathML is pretty essential for XML. Can it be put in the XML >>>>>>> writer? ... >> The idea would be to define a new transform >> (transforms.math.latex2mathml, in a file >> docutils/docutils/transforms/math.py say) that would >> replace the content of math and math-block nodes. >> The code would be a mixture of examples from other transforms and the >> visit_math() method in the html writer. (to avoid duplicating code, >> once it is in place and tested, the html writer should be modified to >> use it as well) > Following your directions, I created math.py in the docutils/transform > directory. To the __init__ .py in writers, I added: > from docutils.transforms import math ... > def get_transforms(self): > return Component.get_transforms(self) + [ > universal.Messages, > universal.FilterMessages, > universal.StripClassesAndElements, > math.Math_Block, this should go into /docutils/writers/docutils_xml.py, otherwise it affects all writers (inheriting from docutils.writers.Writer). > math.py looks like this; > """ > math used by writers I'd use something like """math related transformations""" as docstring for the transforms.math module. > from docutils import writers > from docutils.transforms import writers > """ What is this for? > __docformat__ = 'reStructuredText' > from docutils import nodes, utils, languages > from docutils.transforms import Transform > from docutils.math.latex2mathml import parse_latex_math > class Math_Block(Transform): Do we need separate classes for Math_Block vs. Math_Role or could these be put into one class? Considering that transforms.math might be used for several math-related transforms (equation numbering comes to my mind), I'd use a more telling name, LaTeXmath2MathML, say. > """ > Change the text in the math_block from plain text in LaTeX to > a MathML tree > """ > default_priority = 910 # not sure if this needs to be loaded > earlier or not > def apply(self): > for math_block in self.document.traverse(nodes.math_block): > math_code = math_block.astext() > mathml_tree = parse_latex_math(math_code, inline=False) > # need to append the mathml_tree to math_block > I have a few questions. > (1) How do you get just the text from a node.Element? In my code, the > math_block.astext actually returns a text representation of the node, > including the elements tags, etc. I looked everywhere in > docutils/nodes.py for a method to get just text, but could not find one. > Somehow, feeding the string with the tags to parse_latex_math worked > anyway (following the example in the html writer). Strange. How can I reproduce this? I did a small test inserting print node.astext().encode('utf8') in the visit_math_block() method of the html writer and did get just the content, no tags. > (2) How do I append the resulting tee to the math_block? I tried > math_block.append() and other methods, but it seems the latext2mathml.py > returns a different type of tree then that already created. I think so. Remember that latext2mathml is taken from a user-contributed add-on in the sandbox and is only intended to produce an MathML representation to put into HTML pages. > I could convert the mathml tree to an XML string and then create a tree > from that, and then append the tree? I'm just not sure how to do this. I see several ways forward from here: * your proposal (convert to string and parse this to a compatible tree). Is there a XML parser in the minidom module? * modify latex2mathml to use "compatible" tree nodes based on Docutils' nodes. > (3) How do I make this transformation optional, depending on an options > by the user. The user might have put asciimath in the math_block > element, in which case it should not be transformed by the > latex2mathml.py modulel. Here, you can look at examples for customizable transforms. E.g. the sectnum_xform setting is defined in frontend.py and works on the SectNum(Transform) in transforms/parts.py. Günter
 Re: [Docutils-develop] Using RST for technical textbook From: Paul Tremblay - 2011-10-23 03:49:16 On 10/22/11 9:43 AM, Guenter Milde wrote: > On 2011-10-20, Paul Tremblay wrote: >> On 10/20/11 6:14 AM, Guenter Milde wrote: >>>>>>>> MathML is pretty essential for XML. Can it be put in the XML >>>>>>>> writer? > ... > >>> The idea would be to define a new transform >>> (transforms.math.latex2mathml, in a file >>> docutils/docutils/transforms/math.py say) that would >>> replace the content of math and math-block nodes. >>> The code would be a mixture of examples from other transforms and the >>> visit_math() method in the html writer. (to avoid duplicating code, >>> once it is in place and tested, the html writer should be modified to >>> use it as well) >> Following your directions, I created math.py in the docutils/transform >> directory. To the __init__ .py in writers, I added: >> from docutils.transforms import math > ... > >> def get_transforms(self): >> return Component.get_transforms(self) + [ >> universal.Messages, >> universal.FilterMessages, >> universal.StripClassesAndElements, >> math.Math_Block, > this should go into /docutils/writers/docutils_xml.py, otherwise it > affects all writers (inheriting from docutils.writers.Writer). > >> math.py looks like this; >> """ >> math used by writers > I'd use something like """math related transformations""" as docstring > for the transforms.math module. > >> from docutils import writers >> from docutils.transforms import writers >> """ > What is this for? > >> __docformat__ = 'reStructuredText' >> from docutils import nodes, utils, languages >> from docutils.transforms import Transform >> from docutils.math.latex2mathml import parse_latex_math >> class Math_Block(Transform): > Do we need separate classes for Math_Block vs. Math_Role or could these be > put into one class? > > Considering that transforms.math might be used for several math-related > transforms (equation numbering comes to my mind), I'd use a more telling > name, LaTeXmath2MathML, say. > >> """ >> Change the text in the math_block from plain text in LaTeX to >> a MathML tree >> """ >> default_priority = 910 # not sure if this needs to be loaded >> earlier or not >> def apply(self): >> for math_block in self.document.traverse(nodes.math_block): >> math_code = math_block.astext() >> mathml_tree = parse_latex_math(math_code, inline=False) >> # need to append the mathml_tree to math_block >> I have a few questions. >> (1) How do you get just the text from a node.Element? In my code, the >> math_block.astext actually returns a text representation of the node, >> including the elements tags, etc. I looked everywhere in >> docutils/nodes.py for a method to get just text, but could not find one. >> Somehow, feeding the string with the tags to parse_latex_math worked >> anyway (following the example in the html writer). > Strange. How can I reproduce this? > > I did a small test inserting > > print node.astext().encode('utf8') > > in the visit_math_block() method of the html writer and did get just the > content, no tags. Thanks for your detailed reply. I'll try to work through many of these problems. However, I just want to quickly reply that I was wrong; node.astext() does return just the string. My mistake! > >> (2) How do I append the resulting tee to the math_block? I tried >> math_block.append() and other methods, but it seems the latext2mathml.py >> returns a different type of tree then that already created. > I think so. Remember that latext2mathml is taken from a user-contributed > add-on in the sandbox and is only intended to produce an MathML > representation to put into HTML pages. > >> I could convert the mathml tree to an XML string and then create a tree >> from that, and then append the tree? I'm just not sure how to do this. > I see several ways forward from here: > > * your proposal (convert to string and parse this to a compatible tree). > Is there a XML parser in the minidom module? > > * modify latex2mathml to use "compatible" tree nodes based on Docutils' > nodes. > >> (3) How do I make this transformation optional, depending on an options >> by the user. The user might have put asciimath in the math_block >> element, in which case it should not be transformed by the >> latex2mathml.py modulel. > Here, you can look at examples for customizable transforms. E.g. the > sectnum_xform setting is defined in frontend.py and works on the > SectNum(Transform) in transforms/parts.py. > > Günter > > > ------------------------------------------------------------------------------ > The demand for IT networking professionals continues to grow, and the > demand for specialized networking skills is growing even more rapidly. > Take a complimentary Learning@... Self-Assessment and learn > about Cisco certifications, training, and career opportunities. > http://p.sf.net/sfu/cisco-dev2dev > _______________________________________________ > Docutils-develop mailing list > Docutils-develop@... > https://lists.sourceforge.net/lists/listinfo/docutils-develop > > Please use "Reply All" to reply to the list.
 Re: [Docutils-develop] Using RST for technical textbook From: Paul Tremblay - 2011-10-23 22:04:15 On 10/22/11 9:43 AM, Guenter Milde wrote: > On 2011-10-20, Paul Tremblay wrote: >> On 10/20/11 6:14 AM, Guenter Milde wrote: >>>>>>>> MathML is pretty essential for XML. Can it be put in the XML >>>>>>>> writer? > ... > >>> The idea would be to define a new transform >>> (transforms.math.latex2mathml, in a file >>> docutils/docutils/transforms/math.py say) that would >>> replace the content of math and math-block nodes. >>> The code would be a mixture of examples from other transforms and the >>> visit_math() method in the html writer. (to avoid duplicating code, >>> once it is in place and tested, the html writer should be modified to >>> use it as well) >> Following your directions, I created math.py in the docutils/transform >> directory. To the __init__ .py in writers, I added: >> from docutils.transforms import math > ... > >> def get_transforms(self): >> return Component.get_transforms(self) + [ >> universal.Messages, >> universal.FilterMessages, >> universal.StripClassesAndElements, >> math.Math_Block, > this should go into /docutils/writers/docutils_xml.py, otherwise it > affects all writers (inheriting from docutils.writers.Writer). > >> math.py looks like this; >> """ >> math used by writers > I'd use something like """math related transformations""" as docstring > for the transforms.math module. > >> from docutils import writers >> from docutils.transforms import writers >> """ > What is this for? > >> __docformat__ = 'reStructuredText' >> from docutils import nodes, utils, languages >> from docutils.transforms import Transform >> from docutils.math.latex2mathml import parse_latex_math >> class Math_Block(Transform): > Do we need separate classes for Math_Block vs. Math_Role or could these be > put into one class? > > Considering that transforms.math might be used for several math-related > transforms (equation numbering comes to my mind), I'd use a more telling > name, LaTeXmath2MathML, say. > >> """ >> Change the text in the math_block from plain text in LaTeX to >> a MathML tree >> """ >> default_priority = 910 # not sure if this needs to be loaded >> earlier or not >> def apply(self): >> for math_block in self.document.traverse(nodes.math_block): >> math_code = math_block.astext() >> mathml_tree = parse_latex_math(math_code, inline=False) >> # need to append the mathml_tree to math_block >> I have a few questions. >> (1) How do you get just the text from a node.Element? In my code, the >> math_block.astext actually returns a text representation of the node, >> including the elements tags, etc. I looked everywhere in >> docutils/nodes.py for a method to get just text, but could not find one. >> Somehow, feeding the string with the tags to parse_latex_math worked >> anyway (following the example in the html writer). > Strange. How can I reproduce this? > > I did a small test inserting > > print node.astext().encode('utf8') > > in the visit_math_block() method of the html writer and did get just the > content, no tags. > >> (2) How do I append the resulting tee to the math_block? I tried >> math_block.append() and other methods, but it seems the latext2mathml.py >> returns a different type of tree then that already created. > I think so. Remember that latext2mathml is taken from a user-contributed > add-on in the sandbox and is only intended to produce an MathML > representation to put into HTML pages. > >> I could convert the mathml tree to an XML string and then create a tree >> from that, and then append the tree? I'm just not sure how to do this. > I see several ways forward from here: > > * your proposal (convert to string and parse this to a compatible tree). > Is there a XML parser in the minidom module? > > * modify latex2mathml to use "compatible" tree nodes based on Docutils' > nodes. > >> (3) How do I make this transformation optional, depending on an options >> by the user. The user might have put asciimath in the math_block >> element, in which case it should not be transformed by the >> latex2mathml.py modulel. > Here, you can look at examples for customizable transforms. E.g. the > sectnum_xform setting is defined in frontend.py and works on the > SectNum(Transform) in transforms/parts.py. > > Günter > > > Okay, I've followed all of your suggestions. docutils/writers/docutils.xml now has the following changes: from docutils.transforms import math ^^^^^^^^^^^^ import math class Writer(writers.Writer, Component): ^^^^^ subclassing Component in order to add the transformation supported = ('xml',) """Formats this writer supports.""" settings_spec = ( '"Docutils XML" Writer Options', 'Warning: the --newlines and --indents options may adversely affect ' 'whitespace; use them only for reading convenience.', (('Generate XML with newlines before and after tags.', ['--newlines'], {'action': 'store_true', 'validator': frontend.validate_boolean}), ('Generate XML with indents and newlines.', ['--indents'], {'action': 'store_true', 'validator': frontend.validate_boolean}), ('Omit the XML declaration. Use with caution.', ['--no-xml-declaration'], {'dest': 'xml_declaration', 'default': 1, 'action': 'store_false', 'validator': frontend.validate_boolean}), ('Omit the DOCTYPE declaration.', ['--no-doctype'], {'dest': 'doctype_declaration', 'default': 1, 'action': 'store_false', 'validator': frontend.validate_boolean}), ('Convert LaTeX math in math_block and math to MathML', ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Add an option for --latex-mathml ['--latex-mathml'], {'dest': 'latex_mathml', 'default':False, 'action': 'store_true', 'validator': frontend.validate_boolean}), ('Convert ASCII math in math_block and math to MathML', ^^^^^^^^^^^^^^^^^^^ Add an option for ASCII math ['--ascii-mathml'], {'dest': 'ascii_mathml', 'default':False, 'action': 'store_true', 'validator': frontend.validate_boolean}), )) def get_transforms(self): return Component.get_transforms(self) + [ universal.Messages, universal.FilterMessages, universal.StripClassesAndElements, math.LaTeXmath2MathML, math.Asciimath2MathML, ^^^^^^^^^^^^^^^^ add 2 new writers ] ============================================= The file docutils/transforms/math.py, looks like this: # $Id: writer_aux.py 6433 2010-09-28 08:21:25Z milde$ # Author: Lea Wiemann # Copyright: This module has been placed in the public domain. """ math used by writers """ __docformat__ = 'reStructuredText' from docutils import nodes, utils, languages from docutils.transforms import Transform from docutils.math.latex2mathml import parse_latex_math from xml.dom.minidom import parse, parseString, Node import sys class LaTeXmath2MathML(Transform): """ Change the text in the math_block and math from plain text in LaTeX to a MathML tree """ default_priority = 910 # not sure if this needs to be loaded earlier or not def apply(self): latex_mathml = self.document.settings.latex_mathml if not latex_mathml: return for math_block in self.document.traverse(nodes.math_block): math_code = math_block.astext() try: mathml_tree = parse_latex_math(math_code, inline=False) math_xml = ''.join(mathml_tree.xml()) except SyntaxError, err: err_node = self.document.reporter.error(err, base_node=math_block) math_block.append(err_node) return new_math_block = nodes.Element(rawsource=math_code) new_math_block.tagname = 'math_block' math_block.replace_self(new_math_block) convert_string_to_docutils_tree(math_xml, new_math_block) for math in self.document.traverse(nodes.math): math_code = math.astext() try: mathml_tree = parse_latex_math(math_code, inline=True) math_xml = ''.join(mathml_tree.xml()) except SyntaxError, err: err_node = self.document.reporter.error(err, base_node=math) math.append(err_node) return new_math = nodes.Element(rawsource=math_code) new_math.tagname = 'math' math.replace_self(new_math) convert_string_to_docutils_tree(math_xml, new_math) class Asciimath2MathML(Transform): """ Change the text in the math_block and math from plain text in ASCII to a MathML tree """ default_priority = 910 # not sure if this needs to be loaded earlier or not def apply(self): ascii_mathml = self.document.settings.ascii_mathml if not ascii_mathml: return try: import asciimathml from xml.etree.ElementTree import Element, tostring except ImportError as msg: err_node = self.document.reporter.error(msg, base_node=self.document) return for math_block in self.document.traverse(nodes.math_block): math_code = math_block.astext() math_tree = asciimathml.parse(math_code) math_tree.set('xmlns' ,'http://www.w3.org/1998/Math/MathML';) math_xml = tostring(math_tree, encoding="utf-8") math_xml = math_xml.decode('utf8') new_math_block = nodes.Element(rawsource=math_code) new_math_block.tagname = 'math_block' math_block.replace_self(new_math_block) convert_string_to_docutils_tree(math_xml, new_math_block) for math in self.document.traverse(nodes.math): math_code = math.astext() math_tree = asciimathml.parse(math_code) math_tree.set('xmlns' ,'http://www.w3.org/1998/Math/MathML';) math_xml = tostring(math_tree, encoding="utf-8") math_xml = math_xml.decode('utf8') new_math = nodes.Element(rawsource=math_code) new_math.tagname = 'math' math.replace_self(new_math) convert_string_to_docutils_tree(math_xml, new_math) def convert_string_to_docutils_tree(xml_string, docutils_node): minidom_dom = parseString(xml_string.encode('utf8')) _convert_tree(minidom_dom, docutils_node) def _convert_tree(minidom_node, docutils_node): for child_node in minidom_node.childNodes: if child_node.nodeType == Node.ELEMENT_NODE: tag_name = child_node.nodeName node_text = '' for grand_child in child_node.childNodes: if grand_child.nodeType == Node.TEXT_NODE: node_text += grand_child.nodeValue if node_text.strip() != '': Element = nodes.TextElement(text=node_text) else: Element = nodes.Element() Element.tagname = tag_name attrs = child_node.attributes if attrs: for attrName in attrs.keys(): attrNode = attrs.get(attrName) attrValue = attrNode.nodeValue attr_string_name = attrNode.nodeName Element[attr_string_name] = attrValue docutils_node.append(Element) if len(child_node.childNodes) != 0: _convert_tree(child_node, Element) ============================== I've done simple tests with the math.txt in test/functional/input/data/math.txt, as well as with my own math_ascii.rst file, and the code seems to work. It obviously needs some documentation. Also, there is apparently a bug with minidom when using python 3. I could write another simple function to supplement _convert_tree(minidom_node, docutils_node):, except use the xml.etree module, which is considered more up-to-date than minidom, but which does not work with python older than 2.5. Paul
 Re: [Docutils-develop] Using RST for technical textbook From: Guenter Milde - 2011-10-27 12:28:20 On 2011-10-23, Paul Tremblay wrote: > On 10/22/11 9:43 AM, Guenter Milde wrote: >> On 2011-10-20, Paul Tremblay wrote: >>> On 10/20/11 6:14 AM, Guenter Milde wrote: >>>>>>>>> MathML is pretty essential for XML. Can it be put in the XML >>>>>>>>> writer? >>>> The idea would be to define a new transform >>>> (transforms.math.latex2mathml, in a file >>>> docutils/docutils/transforms/math.py say) that would >>>> replace the content of math and math-block nodes. ... > Okay, I've followed all of your suggestions. > docutils/writers/docutils.xml now has the following changes: Could you post the changes as a unified diff (diff -u old new or svn diff)? This would make for easier reading and also allows to apply the changes as patch. ... > class Writer(writers.Writer, Component): ... > ('Convert LaTeX math in math_block and math to MathML', > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Add an option for --latex-mathml > ['--latex-mathml'], > {'dest': 'latex_mathml', 'default':False, > 'action': 'store_true', 'validator': > frontend.validate_boolean}), > ('Convert ASCII math in math_block and math to MathML', > ^^^^^^^^^^^^^^^^^^^ > Add an option for ASCII math > ['--ascii-mathml'], > {'dest': 'ascii_mathml', 'default':False, > 'action': 'store_true', 'validator': > frontend.validate_boolean}), I would prefer a --math-output option similar to the HTML writer: this is extensible and unifies the work with the rst2* front ends. As default, I'd use 'LaTeX' or the empty string (signifying "no conversion"):: ('Math output format, one of "MathML", "ASCIIMathML" ' 'or "LaTeX". Default: "LaTeX"', ['--math-output'], {'default': 'LaTeX'}), Then we can also merge the two transforms. The documentation would explain that "ASCIIMathML" is "MathML generated by the ASCIImath -> MathML conversion script." (AFAIK, the ASCIIMathML script also accepts LaTeX as input, so this might be an interesting addition also to the HTML writer.) The most complete and best supported LaTeX-math to MathML conversion I know of is MathJax (a system for client-side conversion of LaTeX math markup in web pages into either MathML or some other suitable representation. It is actively supported by the AMS and other bodies. Do you think it might be of interest also for XML output? ... > The file docutils/transforms/math.py, looks like this: ... > try: > import asciimathml > from xml.etree.ElementTree import Element, tostring > except ImportError as msg: > err_node = self.document.reporter.error(msg, > base_node=self.document) > return Do you think we should ship asciimathml.py with Docutils or just provide a download link in the documentation? What are the license terms? Attention: the keyword "as" does not work in Python 2.3 (and maybe some versions later)! ... > def convert_string_to_docutils_tree(xml_string, docutils_node): > minidom_dom = parseString(xml_string.encode('utf8')) > _convert_tree(minidom_dom, docutils_node) > def _convert_tree(minidom_node, docutils_node): These functions should possibly be moved to the utils or math sub-modules. > I've done simple tests with the math.txt in > test/functional/input/data/math.txt, as well as with my own > math_ascii.rst file, and the code seems to work. Have a look into the test directory and at the testing docutils document to see how you can add tests to the test base. Working tests are prerequisite for inclusion in the Docutils core. > It obviously needs some documentation. Obvious places are config.txt (the new config option(s) and directives.txt (math directive). > Also, there is apparently a bug > with minidom when using python 3. I could write another simple function > to supplement _convert_tree(minidom_node, docutils_node):, except use > the xml.etree module, which is considered more up-to-date than minidom, > but which does not work with python older than 2.5. If xml.etree is the future and works seamlessly with Docutils, _convert_tree() could contain a version check or two versions of _convert_tree() be defined for Python < 2.5 vs. Python >= 2.5. Thanks for your work, Günter