docutils-develop — For developer discussions of the implementation.

 Re: [Docutils-develop] problems with name "math" used for modules From: David Goodger - 2011-10-31 21:12:05 On Sat, Oct 29, 2011 at 11:55, Paul Tremblay wrote: > Using the name math for a directory, or for a module, can lead to > unexpected results, namely because Python itself has a module named math. > > If I am in trunk/docutils/docutils (which has a directory called math), > and type:: > > python3 # points to Python 3.2 Don't do that. Never launch Python from inside a package. Removing "." from your PYTHONPATH might help, but it's a band-aid solution. Within Docutils, we avoid issues by using absolute imports. All imports should be rooted at "docutils": of the form "import docutils.whatever.etc" or "from docutils.whatever import etc". > I suggest we rename the math directory. > > Likewise, I think I should rename the module math.py > (trunk/docutils/docutils/transforms/math.py) to something different, > such as math_rst.py And io.py as well? Not gonna happen. Please adjust your behavior first. -- David Goodger ; 
 Re: [Docutils-develop] problems with name "math" used for modules From: Guenter Milde - 2011-10-31 20:40:06 On 2011-10-29, Paul Tremblay wrote: > Using the name math for a directory, or for a module, can lead to > unexpected results, namely because Python itself has a module named math. > If I am in trunk/docutils/docutils (which has a directory called math), > and type:: > python3 # points to Python 3.2 > I get the following error > Fatal Python error: Py_Initialize: can't initialize sys standard streams > File "io.py", line 96 > return decoded.replace(u'\ufeff', u'') > ^ > SyntaxError: invalid syntax > Abort trap: 6 I cannot reproduce this:: ~/Code/Python/docutils-svn/docutils/docutils > python3 Python 3.2.1rc1 (default, May 18 2011, 11:01:17) [GCC 4.6.1 20110507 (prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> > However, even with pyton 2.7, I can create a seemingly bizarre problem: > python > >>> import tempfile > Traceback (most recent call last): > File "", line 1, in > File > "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tempfile.py", > line 34, in > from random import Random as _Random > File > "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/random.py", > line 45, in > from math import log as _log, exp as _exp, pi as _pi, e as _e, ceil > as _ceil > ImportError: cannot import name log > Python is looking for log in math, but sees the math directory first, > and cannot import it. > I suggest we rename the math directory. > Likewise, I think I should rename the module math.py > (trunk/docutils/docutils/transforms/math.py) to something different, > such as math_rst.py I am not sure about the need for this change. What is the general recomendation for module/package names in Python? I am not aware of a "do not use standard module names for your sub-packages/sub-modules" rule. On the contrary, I've seen a recommendation to use "generic" names inside packages. if you really need to start a Python executable from this directory, you might need to delete the pwd from the first position in sys.path (either just delete or append instead of prepend). The jury is still out at http://stackoverflow.com/questions/1959188/absolute-import-failing-in-subpackage-that-shadows-a-stdlib-package-name The problem is handles in http://www.python.org/dev/peps/pep-0328/ and the solution are "absolute-imports" (available since 2.5) However, it states: As Python's library expands, more and more existing package internal modules suddenly shadow standard library modules by accident. It's a particularly difficult problem inside packages because there's no way to specify which module is meant. This implies that it is not wise to use the name of a standard module/package inside a custom package and we might e.g. use "mathematics" instead. Günter 
 Re: [Docutils-develop] two patches From: Paul Tremblay - 2011-10-30 17:20:52 On 10/30/11 10:41 AM, Aahz wrote: > On Sat, Oct 29, 2011, Paul Tremblay wrote: >> I've written two patches. One is for utils.py. It includes two new >> classes and two new functions. >> >> The classes are needed to make the SAX parser work and should not be >> invoked directly. > Why are you using SAX? Instead you probably want > xml.etree.ElementTree.iterparse() etree is not compatible with Python 2.3. SAX is robust, standard, and fast for this task. The most complex part of the code involves handling namespaces. By using SAX, I can control the formatting of the namespaces. For example, I can allow for a default namespace so that the namespace isn't written for each element, making the XML easier to read. Paul 
 Re: [Docutils-develop] two patches From: Aahz - 2011-10-30 14:41:12 On Sat, Oct 29, 2011, Paul Tremblay wrote: > > I've written two patches. One is for utils.py. It includes two new > classes and two new functions. > > The classes are needed to make the SAX parser work and should not be > invoked directly. Why are you using SAX? Instead you probably want xml.etree.ElementTree.iterparse() -- Aahz (aahz@...) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair 
 [Docutils-develop] problems with name "math" used for modules From: Paul Tremblay - 2011-10-29 15:56:00 Using the name math for a directory, or for a module, can lead to unexpected results, namely because Python itself has a module named math. If I am in trunk/docutils/docutils (which has a directory called math), and type:: python3 # points to Python 3.2 I get the following error Fatal Python error: Py_Initialize: can't initialize sys standard streams File "io.py", line 96 return decoded.replace(u'\ufeff', u'') ^ SyntaxError: invalid syntax Abort trap: 6 However, even with pyton 2.7, I can create a seemingly bizarre problem: python >>> import tempfile Traceback (most recent call last): File "", line 1, in File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tempfile.py", line 34, in from random import Random as _Random File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/random.py", line 45, in from math import log as _log, exp as _exp, pi as _pi, e as _e, ceil as _ceil ImportError: cannot import name log Python is looking for log in math, but sees the math directory first, and cannot import it. I suggest we rename the math directory. Likewise, I think I should rename the module math.py (trunk/docutils/docutils/transforms/math.py) to something different, such as math_rst.py Paul 
 [Docutils-develop] two patches From: Paul Tremblay - 2011-10-29 15:24:00 Attachments: utils.diff     test_utils.diff Hi, I've written two patches. One is for utils.py. It includes two new classes and two new functions. The classes are needed to make the SAX parser work and should not be invoked directly. The function XmlStringToDocutilsNodes copies an XML string to a docutils node set. The function xml_copy_tree makes an identical copy of an XML string for testing purposes. (It now occurs to me that this function and the class that goes with it should go in test_utils.py?) The other patch is for test_utils.py. I've included tests to test the new functions. The code is documented. Paul 
 Re: [Docutils-develop] help getting encoding to work From: Guenter Milde - 2011-10-28 22:17:12 On 2011-10-28, Paul Tremblay wrote: > I have written a function to convert an XML string to docutils node. I > used SAX because it is universal and fast. However, I don't understand > how to handle the encodings. ... > Here is the error message: > File > "/Users/cejohnsonlouisville/Documents/docutils/trunk/docutils/docutils/utils.py", > line 784, in XmlStringToDocutilsNodes > parser.parse(read_obj) > UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c0' in > position 111: ordinal not in range(128) > Of course, the problem comes from the XML string passed to the function. > It is encoded in UTF-8. The error looks like a failure to encode an "unicode" string to a "bytes" string using the fallback "ascii/strict" encoding. > I chanced upon this workaround: > if type(xml_string) == type(unicode('x')): > xml_string = xml_string.encode('utf8') > elif type(xml_string) == type('x'): > xml_string = xml_string.decode(encoding) > xml_string = xml_string.encode('utf8') > read_obj = StringIO(xml_string) > Why does this work? In Python 2, parser.parse() expects a "bytes" string. Encoding "unicode" strings with 'utf-8' explicitely bypasses the "dumb" type conversion by Python (using the "ascii" encoding). > In the elif statement, why do I have to decode and > then encode? Because encoding may be different from utf8? Docutils uses Unicode internally, so your converter should not need to decode the input (as long as it is fed from a Docutils doctree node). Just document that the xml_string should be a unicode instance. > Also, I don't believe this code will work in python 3, since python 3 > doesn't have unicode objects. Python 3 only has string and bytes. The str -> bytes unicode -> str conversion is handled by the 2to3 conversion script that processes the source files when "setup.py" is called with Python 3. >I believe this code will work: > if sys.version_info < (3,): > if type(xml_string) == type(unicode('x')): > xml_string = xml_string.encode('utf8') > elif type(xml_string) == type('x'): > xml_string = xml_string.decode(encoding) > xml_string = xml_string.encode('utf8') > #else just keep xml_string the same, since the Sax parser should > be able to handle bytes or a string A simpler typecheck can be done with: if isinstance(xml_string, unicode): ... if isinstance(xml_string, str): ... which should be converted to valid Python3 code by 2to3. Günter 
 [Docutils-develop] help getting encoding to work From: Paul Tremblay - 2011-10-28 04:34:32 I have written a function to convert an XML string to docutils node. I used SAX because it is universal and fast. However, I don't understand how to handle the encodings. Here are the lines of the function: def XmlStringToDocutilsNodes(xml_string): """ Converts an XML String into a docutils node tree, and returns that tree """ read_obj = StringIO(xml_string) the_handle=CopyTree() parser = xml.sax.make_parser() parser.setFeature(feature_namespaces, 1) parser.setContentHandler(the_handle) parser.setFeature("http://xml.org/sax/features/external-general-entities";, True) parser.parse(read_obj) ^^^^^^^^ Here is the error message: File "/Users/cejohnsonlouisville/Documents/docutils/trunk/docutils/docutils/utils.py", line 784, in XmlStringToDocutilsNodes parser.parse(read_obj) UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c0' in position 111: ordinal not in range(128) Of course, the problem comes from the XML string passed to the function. It is encoded in UTF-8. I chanced upon this workaround: if type(xml_string) == type(unicode('x')): xml_string = xml_string.encode('utf8') elif type(xml_string) == type('x'): xml_string = xml_string.decode(encoding) xml_string = xml_string.encode('utf8') read_obj = StringIO(xml_string) Why does this work? In the elif statement, why do I have to decode and then encode? Also, I don't believe this code will work in python 3, since python 3 doesn't have unicode objects. Python 3 only has string and bytes. I believe this code will work: if sys.version_info < (3,): if type(xml_string) == type(unicode('x')): xml_string = xml_string.encode('utf8') elif type(xml_string) == type('x'): xml_string = xml_string.decode(encoding) xml_string = xml_string.encode('utf8') #else just keep xml_string the same, since the Sax parser should be able to handle bytes or a string Paul 
 Re: [Docutils-develop] Using RST for technical textbook From: Guenter Milde - 2011-10-27 12:28:20 On 2011-10-23, Paul Tremblay wrote: > On 10/22/11 9:43 AM, Guenter Milde wrote: >> On 2011-10-20, Paul Tremblay wrote: >>> On 10/20/11 6:14 AM, Guenter Milde wrote: >>>>>>>>> MathML is pretty essential for XML. Can it be put in the XML >>>>>>>>> writer? >>>> The idea would be to define a new transform >>>> (transforms.math.latex2mathml, in a file >>>> docutils/docutils/transforms/math.py say) that would >>>> replace the content of math and math-block nodes. ... > Okay, I've followed all of your suggestions. > docutils/writers/docutils.xml now has the following changes: Could you post the changes as a unified diff (diff -u old new or svn diff)? This would make for easier reading and also allows to apply the changes as patch. ... > class Writer(writers.Writer, Component): ... > ('Convert LaTeX math in math_block and math to MathML', > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Add an option for --latex-mathml > ['--latex-mathml'], > {'dest': 'latex_mathml', 'default':False, > 'action': 'store_true', 'validator': > frontend.validate_boolean}), > ('Convert ASCII math in math_block and math to MathML', > ^^^^^^^^^^^^^^^^^^^ > Add an option for ASCII math > ['--ascii-mathml'], > {'dest': 'ascii_mathml', 'default':False, > 'action': 'store_true', 'validator': > frontend.validate_boolean}), I would prefer a --math-output option similar to the HTML writer: this is extensible and unifies the work with the rst2* front ends. As default, I'd use 'LaTeX' or the empty string (signifying "no conversion"):: ('Math output format, one of "MathML", "ASCIIMathML" ' 'or "LaTeX". Default: "LaTeX"', ['--math-output'], {'default': 'LaTeX'}), Then we can also merge the two transforms. The documentation would explain that "ASCIIMathML" is "MathML generated by the ASCIImath -> MathML conversion script." (AFAIK, the ASCIIMathML script also accepts LaTeX as input, so this might be an interesting addition also to the HTML writer.) The most complete and best supported LaTeX-math to MathML conversion I know of is MathJax (a system for client-side conversion of LaTeX math markup in web pages into either MathML or some other suitable representation. It is actively supported by the AMS and other bodies. Do you think it might be of interest also for XML output? ... > The file docutils/transforms/math.py, looks like this: ... > try: > import asciimathml > from xml.etree.ElementTree import Element, tostring > except ImportError as msg: > err_node = self.document.reporter.error(msg, > base_node=self.document) > return Do you think we should ship asciimathml.py with Docutils or just provide a download link in the documentation? What are the license terms? Attention: the keyword "as" does not work in Python 2.3 (and maybe some versions later)! ... > def convert_string_to_docutils_tree(xml_string, docutils_node): > minidom_dom = parseString(xml_string.encode('utf8')) > _convert_tree(minidom_dom, docutils_node) > def _convert_tree(minidom_node, docutils_node): These functions should possibly be moved to the utils or math sub-modules. > I've done simple tests with the math.txt in > test/functional/input/data/math.txt, as well as with my own > math_ascii.rst file, and the code seems to work. Have a look into the test directory and at the testing docutils document to see how you can add tests to the test base. Working tests are prerequisite for inclusion in the Docutils core. > It obviously needs some documentation. Obvious places are config.txt (the new config option(s) and directives.txt (math directive). > Also, there is apparently a bug > with minidom when using python 3. I could write another simple function > to supplement _convert_tree(minidom_node, docutils_node):, except use > the xml.etree module, which is considered more up-to-date than minidom, > but which does not work with python older than 2.5. If xml.etree is the future and works seamlessly with Docutils, _convert_tree() could contain a version check or two versions of _convert_tree() be defined for Python < 2.5 vs. Python >= 2.5. Thanks for your work, Günter 
 Re: [Docutils-develop] how to write mixed elements with nodes.py From: Paul Tremblay - 2011-10-25 03:22:33 Attachments: Message as HTML On 10/24/11 10:04 PM, David Goodger wrote: > On Mon, Oct 24, 2011 at 17:38, Paul Tremblay wrote: >> However, since I am adding elements that are not part of the normal docutils >> tree (that is, MathML elements, such as), I assume I still have to >> use:: >> >> e = nodes.Element() >> e.tagname='mrow' >> >> ?? >> >> (I did look in docutils/test/test_nodes.py, as well as nodes.py, of course.) > You *could* do it that way, but I wouldn't. I would make subclasses of > element classes from docutils.nodes. The tagname attribute is taken > automatically from the class name, and you can have custom > functionality (see docutils.nodes for examples). > I see. So to add a "mi" element: class mi(nodes.TextElement): pass element= mi(text = '5.66') The problem is I am trying to come up with a general function to convert any XML string to docutils node, so I don't always know what elements I will encounter. For example, .. raw:: xml Produces elements of unknown name, so I cannot create an Element class for each. Paul 
 Re: [Docutils-develop] how to write mixed elements with nodes.py From: David Goodger - 2011-10-25 02:04:41 On Mon, Oct 24, 2011 at 17:38, Paul Tremblay wrote: > However, since I am adding elements that are not part of the normal docutils > tree (that is, MathML elements, such as ), I assume I still have to > use:: > > e = nodes.Element() > e.tagname='mrow' > > ?? > > (I did look in docutils/test/test_nodes.py, as well as nodes.py, of course.) You *could* do it that way, but I wouldn't. I would make subclasses of element classes from docutils.nodes. The tagname attribute is taken automatically from the class name, and you can have custom functionality (see docutils.nodes for examples). -- David Goodger ; 
 Re: [Docutils-develop] how to write mixed elements with nodes.py From: Paul Tremblay - 2011-10-24 21:38:14 On 10/24/11 4:41 PM, David Goodger wrote: > On Mon, Oct 24, 2011 at 16:20, Paul Tremblay wrote: >> Can someone point me to how to write elements that mix elements and text >> in a docutils tree, using docutils/nodes.py? > You're doing it wrong. > >>>> p = nodes.paragraph(text='Some initial text, followed by ') >>>> p.append(nodes.emphasis(text='some emphasized text')) >>>> p.append(nodes.Text('.')) >>>> print p > Some initial text, followed bysome emphasized > text. >>>> print p.pformat() > > Some initial text, followed by > > some emphasized text > . > > Note that text-containing elements (like paragraph or emphasis) > require text= (because it's the second parameter; the first is > rawsource). But Text (used for text without a surrounding element) > doesn't use text=. > > The source and the tests (e.g. docutils/test/test_nodes.py) are good > sources of "how-to" info. > Use the source, Luke! Thank you Obi-Wan Kenobi! Yes, nodes.Text() was what I was looking for However, since I am adding elements that are not part of the normal docutils tree (that is, MathML elements, such as ), I assume I still have to use:: e = nodes.Element() e.tagname='mrow' ?? (I did look in docutils/test/test_nodes.py, as well as nodes.py, of course.) Luke (AKA Paul) > 
 Re: [Docutils-develop] how to write mixed elements with nodes.py From: David Goodger - 2011-10-24 20:41:37 On Mon, Oct 24, 2011 at 16:20, Paul Tremblay wrote: > Can someone point me to how to write elements that mix elements and text > in a docutils tree, using docutils/nodes.py? You're doing it wrong. >>> p = nodes.paragraph(text='Some initial text, followed by ') >>> p.append(nodes.emphasis(text='some emphasized text')) >>> p.append(nodes.Text('.')) >>> print p Some initial text, followed by some emphasized text. >>> print p.pformat() Some initial text, followed by some emphasized text . Note that text-containing elements (like paragraph or emphasis) require text= (because it's the second parameter; the first is rawsource). But Text (used for text without a surrounding element) doesn't use text=. The source and the tests (e.g. docutils/test/test_nodes.py) are good sources of "how-to" info. Use the source, Luke! -- David Goodger ; 
 [Docutils-develop] how to write mixed elements with nodes.py From: Paul Tremblay - 2011-10-24 20:21:02 Can someone point me to how to write elements that mix elements and text in a docutils tree, using docutils/nodes.py? For example, take t his fragment: text word text The nodes.Element class can create a node with no text, and the nodes.TextElement can add nodes with text, but I can't find the method for just adding text to an element. from docutils import nodes >>> from docutils import nodes >>> element = nodes.Element() >>> element.tagname = 'p' >>> print(element)

element2 = nodes.TextElement(text='word') >>> element2.tagname = 'emphasis' >>> element.append(element2) >>> print element

word

What I need to do is first append the text to element, then the element2, then more text. I can see the method for doing this in minidom, but not in nodes. In case you are wondering why I need to do this, consider that for the math.py patch, I need to convert an XML string to a docutils tree. The code I have already suggested works, but only because MathML does not mix text and elements. I would like to write a more complete function that could convert any string to a docutils node. Thanks Paul 
 Re: [Docutils-develop] Using RST for technical textbook From: Paul Tremblay - 2011-10-23 22:04:15 On 10/22/11 9:43 AM, Guenter Milde wrote: > On 2011-10-20, Paul Tremblay wrote: >> On 10/20/11 6:14 AM, Guenter Milde wrote: >>>>>>>> MathML is pretty essential for XML. Can it be put in the XML >>>>>>>> writer? > ... > >>> The idea would be to define a new transform >>> (transforms.math.latex2mathml, in a file >>> docutils/docutils/transforms/math.py say) that would >>> replace the content of math and math-block nodes. >>> The code would be a mixture of examples from other transforms and the >>> visit_math() method in the html writer. (to avoid duplicating code, >>> once it is in place and tested, the html writer should be modified to >>> use it as well) >> Following your directions, I created math.py in the docutils/transform >> directory. To the __init__ .py in writers, I added: >> from docutils.transforms import math > ... > >> def get_transforms(self): >> return Component.get_transforms(self) + [ >> universal.Messages, >> universal.FilterMessages, >> universal.StripClassesAndElements, >> math.Math_Block, > this should go into /docutils/writers/docutils_xml.py, otherwise it > affects all writers (inheriting from docutils.writers.Writer). > >> math.py looks like this; >> """ >> math used by writers > I'd use something like """math related transformations""" as docstring > for the transforms.math module. > >> from docutils import writers >> from docutils.transforms import writers >> """ > What is this for? > >> __docformat__ = 'reStructuredText' >> from docutils import nodes, utils, languages >> from docutils.transforms import Transform >> from docutils.math.latex2mathml import parse_latex_math >> class Math_Block(Transform): > Do we need separate classes for Math_Block vs. Math_Role or could these be > put into one class? > > Considering that transforms.math might be used for several math-related > transforms (equation numbering comes to my mind), I'd use a more telling > name, LaTeXmath2MathML, say. > >> """ >> Change the text in the math_block from plain text in LaTeX to >> a MathML tree >> """ >> default_priority = 910 # not sure if this needs to be loaded >> earlier or not >> def apply(self): >> for math_block in self.document.traverse(nodes.math_block): >> math_code = math_block.astext() >> mathml_tree = parse_latex_math(math_code, inline=False) >> # need to append the mathml_tree to math_block >> I have a few questions. >> (1) How do you get just the text from a node.Element? In my code, the >> math_block.astext actually returns a text representation of the node, >> including the elements tags, etc. I looked everywhere in >> docutils/nodes.py for a method to get just text, but could not find one. >> Somehow, feeding the string with the tags to parse_latex_math worked >> anyway (following the example in the html writer). > Strange. How can I reproduce this? > > I did a small test inserting > > print node.astext().encode('utf8') > > in the visit_math_block() method of the html writer and did get just the > content, no tags. > >> (2) How do I append the resulting tee to the math_block? I tried >> math_block.append() and other methods, but it seems the latext2mathml.py >> returns a different type of tree then that already created. > I think so. Remember that latext2mathml is taken from a user-contributed > add-on in the sandbox and is only intended to produce an MathML > representation to put into HTML pages. > >> I could convert the mathml tree to an XML string and then create a tree >> from that, and then append the tree? I'm just not sure how to do this. > I see several ways forward from here: > > * your proposal (convert to string and parse this to a compatible tree). > Is there a XML parser in the minidom module? > > * modify latex2mathml to use "compatible" tree nodes based on Docutils' > nodes. > >> (3) How do I make this transformation optional, depending on an options >> by the user. The user might have put asciimath in the math_block >> element, in which case it should not be transformed by the >> latex2mathml.py modulel. > Here, you can look at examples for customizable transforms. E.g. the > sectnum_xform setting is defined in frontend.py and works on the > SectNum(Transform) in transforms/parts.py. > > Günter > > > Okay, I've followed all of your suggestions. docutils/writers/docutils.xml now has the following changes: from docutils.transforms import math ^^^^^^^^^^^^ import math class Writer(writers.Writer, Component): ^^^^^ subclassing Component in order to add the transformation supported = ('xml',) """Formats this writer supports.""" settings_spec = ( '"Docutils XML" Writer Options', 'Warning: the --newlines and --indents options may adversely affect ' 'whitespace; use them only for reading convenience.', (('Generate XML with newlines before and after tags.', ['--newlines'], {'action': 'store_true', 'validator': frontend.validate_boolean}), ('Generate XML with indents and newlines.', ['--indents'], {'action': 'store_true', 'validator': frontend.validate_boolean}), ('Omit the XML declaration. Use with caution.', ['--no-xml-declaration'], {'dest': 'xml_declaration', 'default': 1, 'action': 'store_false', 'validator': frontend.validate_boolean}), ('Omit the DOCTYPE declaration.', ['--no-doctype'], {'dest': 'doctype_declaration', 'default': 1, 'action': 'store_false', 'validator': frontend.validate_boolean}), ('Convert LaTeX math in math_block and math to MathML', ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Add an option for --latex-mathml ['--latex-mathml'], {'dest': 'latex_mathml', 'default':False, 'action': 'store_true', 'validator': frontend.validate_boolean}), ('Convert ASCII math in math_block and math to MathML', ^^^^^^^^^^^^^^^^^^^ Add an option for ASCII math ['--ascii-mathml'], {'dest': 'ascii_mathml', 'default':False, 'action': 'store_true', 'validator': frontend.validate_boolean}), )) def get_transforms(self): return Component.get_transforms(self) + [ universal.Messages, universal.FilterMessages, universal.StripClassesAndElements, math.LaTeXmath2MathML, math.Asciimath2MathML, ^^^^^^^^^^^^^^^^ add 2 new writers ] ============================================= The file docutils/transforms/math.py, looks like this: # $Id: writer_aux.py 6433 2010-09-28 08:21:25Z milde$ # Author: Lea Wiemann # Copyright: This module has been placed in the public domain. """ math used by writers """ __docformat__ = 'reStructuredText' from docutils import nodes, utils, languages from docutils.transforms import Transform from docutils.math.latex2mathml import parse_latex_math from xml.dom.minidom import parse, parseString, Node import sys class LaTeXmath2MathML(Transform): """ Change the text in the math_block and math from plain text in LaTeX to a MathML tree """ default_priority = 910 # not sure if this needs to be loaded earlier or not def apply(self): latex_mathml = self.document.settings.latex_mathml if not latex_mathml: return for math_block in self.document.traverse(nodes.math_block): math_code = math_block.astext() try: mathml_tree = parse_latex_math(math_code, inline=False) math_xml = ''.join(mathml_tree.xml()) except SyntaxError, err: err_node = self.document.reporter.error(err, base_node=math_block) math_block.append(err_node) return new_math_block = nodes.Element(rawsource=math_code) new_math_block.tagname = 'math_block' math_block.replace_self(new_math_block) convert_string_to_docutils_tree(math_xml, new_math_block) for math in self.document.traverse(nodes.math): math_code = math.astext() try: mathml_tree = parse_latex_math(math_code, inline=True) math_xml = ''.join(mathml_tree.xml()) except SyntaxError, err: err_node = self.document.reporter.error(err, base_node=math) math.append(err_node) return new_math = nodes.Element(rawsource=math_code) new_math.tagname = 'math' math.replace_self(new_math) convert_string_to_docutils_tree(math_xml, new_math) class Asciimath2MathML(Transform): """ Change the text in the math_block and math from plain text in ASCII to a MathML tree """ default_priority = 910 # not sure if this needs to be loaded earlier or not def apply(self): ascii_mathml = self.document.settings.ascii_mathml if not ascii_mathml: return try: import asciimathml from xml.etree.ElementTree import Element, tostring except ImportError as msg: err_node = self.document.reporter.error(msg, base_node=self.document) return for math_block in self.document.traverse(nodes.math_block): math_code = math_block.astext() math_tree = asciimathml.parse(math_code) math_tree.set('xmlns' ,'http://www.w3.org/1998/Math/MathML';) math_xml = tostring(math_tree, encoding="utf-8") math_xml = math_xml.decode('utf8') new_math_block = nodes.Element(rawsource=math_code) new_math_block.tagname = 'math_block' math_block.replace_self(new_math_block) convert_string_to_docutils_tree(math_xml, new_math_block) for math in self.document.traverse(nodes.math): math_code = math.astext() math_tree = asciimathml.parse(math_code) math_tree.set('xmlns' ,'http://www.w3.org/1998/Math/MathML';) math_xml = tostring(math_tree, encoding="utf-8") math_xml = math_xml.decode('utf8') new_math = nodes.Element(rawsource=math_code) new_math.tagname = 'math' math.replace_self(new_math) convert_string_to_docutils_tree(math_xml, new_math) def convert_string_to_docutils_tree(xml_string, docutils_node): minidom_dom = parseString(xml_string.encode('utf8')) _convert_tree(minidom_dom, docutils_node) def _convert_tree(minidom_node, docutils_node): for child_node in minidom_node.childNodes: if child_node.nodeType == Node.ELEMENT_NODE: tag_name = child_node.nodeName node_text = '' for grand_child in child_node.childNodes: if grand_child.nodeType == Node.TEXT_NODE: node_text += grand_child.nodeValue if node_text.strip() != '': Element = nodes.TextElement(text=node_text) else: Element = nodes.Element() Element.tagname = tag_name attrs = child_node.attributes if attrs: for attrName in attrs.keys(): attrNode = attrs.get(attrName) attrValue = attrNode.nodeValue attr_string_name = attrNode.nodeName Element[attr_string_name] = attrValue docutils_node.append(Element) if len(child_node.childNodes) != 0: _convert_tree(child_node, Element) ============================== I've done simple tests with the math.txt in test/functional/input/data/math.txt, as well as with my own math_ascii.rst file, and the code seems to work. It obviously needs some documentation. Also, there is apparently a bug with minidom when using python 3. I could write another simple function to supplement _convert_tree(minidom_node, docutils_node):, except use the xml.etree module, which is considered more up-to-date than minidom, but which does not work with python older than 2.5. Paul 
 Re: [Docutils-develop] Using RST for technical textbook From: Paul Tremblay - 2011-10-23 03:49:16 On 10/22/11 9:43 AM, Guenter Milde wrote: > On 2011-10-20, Paul Tremblay wrote: >> On 10/20/11 6:14 AM, Guenter Milde wrote: >>>>>>>> MathML is pretty essential for XML. Can it be put in the XML >>>>>>>> writer? > ... > >>> The idea would be to define a new transform >>> (transforms.math.latex2mathml, in a file >>> docutils/docutils/transforms/math.py say) that would >>> replace the content of math and math-block nodes. >>> The code would be a mixture of examples from other transforms and the >>> visit_math() method in the html writer. (to avoid duplicating code, >>> once it is in place and tested, the html writer should be modified to >>> use it as well) >> Following your directions, I created math.py in the docutils/transform >> directory. To the __init__ .py in writers, I added: >> from docutils.transforms import math > ... > >> def get_transforms(self): >> return Component.get_transforms(self) + [ >> universal.Messages, >> universal.FilterMessages, >> universal.StripClassesAndElements, >> math.Math_Block, > this should go into /docutils/writers/docutils_xml.py, otherwise it > affects all writers (inheriting from docutils.writers.Writer). > >> math.py looks like this; >> """ >> math used by writers > I'd use something like """math related transformations""" as docstring > for the transforms.math module. > >> from docutils import writers >> from docutils.transforms import writers >> """ > What is this for? > >> __docformat__ = 'reStructuredText' >> from docutils import nodes, utils, languages >> from docutils.transforms import Transform >> from docutils.math.latex2mathml import parse_latex_math >> class Math_Block(Transform): > Do we need separate classes for Math_Block vs. Math_Role or could these be > put into one class? > > Considering that transforms.math might be used for several math-related > transforms (equation numbering comes to my mind), I'd use a more telling > name, LaTeXmath2MathML, say. > >> """ >> Change the text in the math_block from plain text in LaTeX to >> a MathML tree >> """ >> default_priority = 910 # not sure if this needs to be loaded >> earlier or not >> def apply(self): >> for math_block in self.document.traverse(nodes.math_block): >> math_code = math_block.astext() >> mathml_tree = parse_latex_math(math_code, inline=False) >> # need to append the mathml_tree to math_block >> I have a few questions. >> (1) How do you get just the text from a node.Element? In my code, the >> math_block.astext actually returns a text representation of the node, >> including the elements tags, etc. I looked everywhere in >> docutils/nodes.py for a method to get just text, but could not find one. >> Somehow, feeding the string with the tags to parse_latex_math worked >> anyway (following the example in the html writer). > Strange. How can I reproduce this? > > I did a small test inserting > > print node.astext().encode('utf8') > > in the visit_math_block() method of the html writer and did get just the > content, no tags. Thanks for your detailed reply. I'll try to work through many of these problems. However, I just want to quickly reply that I was wrong; node.astext() does return just the string. My mistake! > >> (2) How do I append the resulting tee to the math_block? I tried >> math_block.append() and other methods, but it seems the latext2mathml.py >> returns a different type of tree then that already created. > I think so. Remember that latext2mathml is taken from a user-contributed > add-on in the sandbox and is only intended to produce an MathML > representation to put into HTML pages. > >> I could convert the mathml tree to an XML string and then create a tree >> from that, and then append the tree? I'm just not sure how to do this. > I see several ways forward from here: > > * your proposal (convert to string and parse this to a compatible tree). > Is there a XML parser in the minidom module? > > * modify latex2mathml to use "compatible" tree nodes based on Docutils' > nodes. > >> (3) How do I make this transformation optional, depending on an options >> by the user. The user might have put asciimath in the math_block >> element, in which case it should not be transformed by the >> latex2mathml.py modulel. > Here, you can look at examples for customizable transforms. E.g. the > sectnum_xform setting is defined in frontend.py and works on the > SectNum(Transform) in transforms/parts.py. > > Günter > > > ------------------------------------------------------------------------------ > The demand for IT networking professionals continues to grow, and the > demand for specialized networking skills is growing even more rapidly. > Take a complimentary Learning@... Self-Assessment and learn > about Cisco certifications, training, and career opportunities. > http://p.sf.net/sfu/cisco-dev2dev > _______________________________________________ > Docutils-develop mailing list > Docutils-develop@... > https://lists.sourceforge.net/lists/listinfo/docutils-develop > > Please use "Reply All" to reply to the list. 
 Re: [Docutils-develop] Using RST for technical textbook From: Guenter Milde - 2011-10-22 13:43:53 On 2011-10-20, Paul Tremblay wrote: > On 10/20/11 6:14 AM, Guenter Milde wrote: >>>>>>> MathML is pretty essential for XML. Can it be put in the XML >>>>>>> writer? ... >> The idea would be to define a new transform >> (transforms.math.latex2mathml, in a file >> docutils/docutils/transforms/math.py say) that would >> replace the content of math and math-block nodes. >> The code would be a mixture of examples from other transforms and the >> visit_math() method in the html writer. (to avoid duplicating code, >> once it is in place and tested, the html writer should be modified to >> use it as well) > Following your directions, I created math.py in the docutils/transform > directory. To the __init__ .py in writers, I added: > from docutils.transforms import math ... > def get_transforms(self): > return Component.get_transforms(self) + [ > universal.Messages, > universal.FilterMessages, > universal.StripClassesAndElements, > math.Math_Block, this should go into /docutils/writers/docutils_xml.py, otherwise it affects all writers (inheriting from docutils.writers.Writer). > math.py looks like this; > """ > math used by writers I'd use something like """math related transformations""" as docstring for the transforms.math module. > from docutils import writers > from docutils.transforms import writers > """ What is this for? > __docformat__ = 'reStructuredText' > from docutils import nodes, utils, languages > from docutils.transforms import Transform > from docutils.math.latex2mathml import parse_latex_math > class Math_Block(Transform): Do we need separate classes for Math_Block vs. Math_Role or could these be put into one class? Considering that transforms.math might be used for several math-related transforms (equation numbering comes to my mind), I'd use a more telling name, LaTeXmath2MathML, say. > """ > Change the text in the math_block from plain text in LaTeX to > a MathML tree > """ > default_priority = 910 # not sure if this needs to be loaded > earlier or not > def apply(self): > for math_block in self.document.traverse(nodes.math_block): > math_code = math_block.astext() > mathml_tree = parse_latex_math(math_code, inline=False) > # need to append the mathml_tree to math_block > I have a few questions. > (1) How do you get just the text from a node.Element? In my code, the > math_block.astext actually returns a text representation of the node, > including the elements tags, etc. I looked everywhere in > docutils/nodes.py for a method to get just text, but could not find one. > Somehow, feeding the string with the tags to parse_latex_math worked > anyway (following the example in the html writer). Strange. How can I reproduce this? I did a small test inserting print node.astext().encode('utf8') in the visit_math_block() method of the html writer and did get just the content, no tags. > (2) How do I append the resulting tee to the math_block? I tried > math_block.append() and other methods, but it seems the latext2mathml.py > returns a different type of tree then that already created. I think so. Remember that latext2mathml is taken from a user-contributed add-on in the sandbox and is only intended to produce an MathML representation to put into HTML pages. > I could convert the mathml tree to an XML string and then create a tree > from that, and then append the tree? I'm just not sure how to do this. I see several ways forward from here: * your proposal (convert to string and parse this to a compatible tree). Is there a XML parser in the minidom module? * modify latex2mathml to use "compatible" tree nodes based on Docutils' nodes. > (3) How do I make this transformation optional, depending on an options > by the user. The user might have put asciimath in the math_block > element, in which case it should not be transformed by the > latex2mathml.py modulel. Here, you can look at examples for customizable transforms. E.g. the sectnum_xform setting is defined in frontend.py and works on the SectNum(Transform) in transforms/parts.py. Günter 
 [Docutils-develop] [ docutils-Bugs-3395948 ] C locale + Python 3 -> UnicodeDecodeError From: SourceForge.net - 2011-10-20 23:05:08 Bugs item #3395948, was opened at 2011-08-21 22:41 Message generated for change (Comment added) made by milde You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=422030&aid=3395948&group_id=38414 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Jakub Wilk (ubanus) Assigned to: Nobody/Anonymous (nobody) Summary: C locale + Python 3 -> UnicodeDecodeError Initial Comment: When using C locale and Python 3.X, I cannot convert reST documents containing non-ASCII character. It works fine when using Python 2.X: $printf '\303\263' > test.xml$ rst2xml.py --version rst2xml.py (Docutils 0.8 [release], Python 3.2.2rc1, on linux2) $LC_ALL=C python /usr/local/bin/rst2xml.py test.xml > /dev/null && echo OK OK$ LC_ALL=C python3 /usr/local/bin/rst2xml.py --traceback test.xml > /dev/null Traceback (most recent call last): File "/usr/local/bin/rst2xml.py", line 23, in publish_cmdline(writer_name='xml', description=description) File "/usr/local/lib/python3.2/dist-packages/docutils/core.py", line 339, in publish_cmdline config_section=config_section, enable_exit_status=enable_exit_status) File "/usr/local/lib/python3.2/dist-packages/docutils/core.py", line 211, in publish self.settings) File "/usr/local/lib/python3.2/dist-packages/docutils/readers/__init__.py", line 68, in read self.input = self.source.read() File "/usr/local/lib/python3.2/dist-packages/docutils/io.py", line 238, in read data = self.source.read() File "/usr/lib/python3.2/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128) ---------------------------------------------------------------------- >Comment By: Günter Milde (milde) Date: 2011-10-20 23:05 Message: > Docutils is able to detect UTF-8 just fine when locale encoding is 8859-n: I cannot reproduce this: $LC_ALL=en_US.ISO-8859-1 python3 Python 3.2.1rc1 (default, May 18 2011, 11:01:17) [GCC 4.6.1 20110507 (prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> f = open('umlauts.txt') >>> f.read() Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.2/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128) > Also, adding --input-encoding=utf8 doesn't really help (which might be > another bug). rst2xml.py just dies with a very confusing error message: > >$ LC_ALL=C python3 /usr/local/bin/rst2xml.py --input-encoding=utf8 > test.xml > UnicodeEncodeError: 'ascii' codec can't encode character '\xf3' in > position 263: ordinal not in range(128) Up to here, the error message is more than clear. The problem is that Docutils find sys.stdout already open with the encoding and error handler set and hence ignores the settings reported in the remainder of the error message. This is indeed bug. It can be worked around by * specifying the expected input/output encoding in the LANG variable, or * specifying --input-encoding and an output file (which is then opened with the given encoding). Nonetheless, both problems should be solved with the latest SVN version. ---------------------------------------------------------------------- Comment By: Jakub Wilk (jakub-wilk) Date: 2011-10-16 13:27 Message: (FWIW, I upgraded to Docutils 0.8.1 in the mean time.) I don't buy the "we can't guess encodings in Python 3" argument. In fact, Docutils is able to detect UTF-8 just fine when locale encoding is ISO-8859-n: $LC_ALL=en_US.ISO-8859-1 python3 /usr/local/bin/rst2xml.py test.xml | md5sum 2dfeff49a2ce2aa24d6217e0160a8326 -$ LC_ALL=pl_PL.ISO-8859-2 python3 /usr/local/bin/rst2xml.py test.xml | md5sum 2dfeff49a2ce2aa24d6217e0160a8326 - $LC_ALL=en_US.UTF-8 python3 /usr/local/bin/rst2xml.py test.xml | md5sum 2dfeff49a2ce2aa24d6217e0160a8326 - Also, adding --input-encoding=utf8 doesn't really help (which might be another bug). rst2xml.py just dies with a very confusing error message:$ LC_ALL=C python3 /usr/local/bin/rst2xml.py --input-encoding=utf8 test.xml UnicodeEncodeError: 'ascii' codec can't encode character '\xf3' in position 263: ordinal not in range(128) The specified output encoding (utf-8) cannot handle all of the output. Try setting "--output-encoding-error-handler" to * "xmlcharrefreplace" (for HTML & XML output); the output will contain "b'ó'" and should be usable. * "backslashreplace" (for other output formats); look for "b'\\xf3'" in the output. * "replace"; look for "?" in the output. "--output-encoding-error-handler" is currently set to "xmlcharrefreplace". Exiting due to error. Use "--traceback" to diagnose. If the advice above doesn't eliminate the error, please report it to . Include "--traceback" output, Docutils version (0.8.1), Python version (3.2.2rc1), your OS type & version, and the command line used. ---------------------------------------------------------------------- Comment By: Günter Milde (milde) Date: 2011-08-22 13:21 Message: Thanks for the bug report -- however, I am not sure the behaviour is a bug. It is the standard Python 3 response to non-ASCII characters when no encoding is specified. With Python 2, Docutils does the input file decoding (including some guesswork), with Python 3 the standard file.read() method also decodes the result into a unicode string. Using "binary" mode is no sensible option: * rst files are text, not binary data * we lose the universal newline support (NL vs CR vs. CR/NL issue with different OS) Specify the input encoding, e.g. rst2xml.py --input-encoding=utf8 We might consider catching the error and writing a more helpfull message, but this should be discussed in the docutils-devel list. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=422030&aid=3395948&group_id=38414 
 [Docutils-develop] [ docutils-Bugs-3423983 ] test_writers.test_docutils_xml.DocutilsXMLTestCase fails From: SourceForge.net - 2011-10-20 21:22:39 Bugs item #3423983, was opened at 2011-10-15 02:18 Message generated for change (Comment added) made by jakub-wilk You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=422030&aid=3423983&group_id=38414 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Pending Resolution: None Priority: 5 Private: No Submitted By: Jakub Wilk (jakub-wilk) Assigned to: Nobody/Anonymous (nobody) Summary: test_writers.test_docutils_xml.DocutilsXMLTestCase fails Initial Comment: It looks like recent changes to xml.dom.minidom[0][1] broke test_writers.test_docutils_xml.DocutilsXMLTestCase: ====================================================================== FAIL: test_publish (test_writers.test_docutils_xml.DocutilsXMLTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/build/python-docutils-_daXVI/python-docutils-0.8.1/test/test_writers/test_docutils_xml.py", line 80, in test_publish expected) File "/build/python-docutils-_daXVI/python-docutils-0.8.1/test/DocutilsTestSupport.py", line 116, in failUnlessEqual (msg or '%s != %s' % _format_str(first, second)) AssertionError: '''\ Test Test. \xe4\xf6\xfc€ ''' != '''\ Test Test. \xe4\xf6\xfc€ ''' [0] http://bugs.python.org/issue4147 [1] http://hg.python.org/cpython/rev/fa0b1e50270f ---------------------------------------------------------------------- Comment By: Jakub Wilk (jakub-wilk) Date: 2011-10-20 23:22 Message: Indeed, when I originally wrote the bug report, I didn't realize that Python in Debian is so heavily patched (it's in fact based on Mercurial snapshot). Sorry for the noise. ---------------------------------------------------------------------- Comment By: Günter Milde (milde) Date: 2011-10-20 23:13 Message: Looking at the Debian bugreport http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=645369 I get the impression that this affects only a patched xml.dom.minidom.toprettyxml in Debian's "python2.7" package, version 2.7.2-7. Putting this in "Pending" status. For a solution, either the patch needs to be revoked or we need a recipe to tell the patched module from the original one. ---------------------------------------------------------------------- Comment By: Günter Milde (milde) Date: 2011-10-17 12:30 Message: Thanks for your bug report. It looks like the test needs either to consider python versions or to be relaxed. Can you provide a version test line that could be used to define two variants of the bodyindents string used in test/test_writers/test_docutils_xml.py? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=422030&aid=3423983&group_id=38414 
 [Docutils-develop] [ docutils-Bugs-3423983 ] test_writers.test_docutils_xml.DocutilsXMLTestCase fails From: SourceForge.net - 2011-10-20 21:13:48 Bugs item #3423983, was opened at 2011-10-15 00:18 Message generated for change (Comment added) made by milde You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=422030&aid=3423983&group_id=38414 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Pending Resolution: None Priority: 5 Private: No Submitted By: Jakub Wilk (jakub-wilk) Assigned to: Nobody/Anonymous (nobody) Summary: test_writers.test_docutils_xml.DocutilsXMLTestCase fails Initial Comment: It looks like recent changes to xml.dom.minidom[0][1] broke test_writers.test_docutils_xml.DocutilsXMLTestCase: ====================================================================== FAIL: test_publish (test_writers.test_docutils_xml.DocutilsXMLTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/build/python-docutils-_daXVI/python-docutils-0.8.1/test/test_writers/test_docutils_xml.py", line 80, in test_publish expected) File "/build/python-docutils-_daXVI/python-docutils-0.8.1/test/DocutilsTestSupport.py", line 116, in failUnlessEqual (msg or '%s != %s' % _format_str(first, second)) AssertionError: '''\ Test Test. \xe4\xf6\xfc€ ''' != '''\ Test Test. \xe4\xf6\xfc€ ''' [0] http://bugs.python.org/issue4147 [1] http://hg.python.org/cpython/rev/fa0b1e50270f ---------------------------------------------------------------------- >Comment By: Günter Milde (milde) Date: 2011-10-20 21:13 Message: Looking at the Debian bugreport http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=645369 I get the impression that this affects only a patched xml.dom.minidom.toprettyxml in Debian's "python2.7" package, version 2.7.2-7. Putting this in "Pending" status. For a solution, either the patch needs to be revoked or we need a recipe to tell the patched module from the original one. ---------------------------------------------------------------------- Comment By: Günter Milde (milde) Date: 2011-10-17 10:30 Message: Thanks for your bug report. It looks like the test needs either to consider python versions or to be relaxed. Can you provide a version test line that could be used to define two variants of the bodyindents string used in test/test_writers/test_docutils_xml.py? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=422030&aid=3423983&group_id=38414 
 Re: [Docutils-develop] [Docutils-users] Using RST for technical textbook From: Paul Tremblay - 2011-10-20 16:18:59 On 10/20/11 6:14 AM, Guenter Milde wrote: > On 19.10.11, Paul Tremblay wrote: >> On 10/19/11 5:46 PM, Guenter Milde wrote: >>>>>> MathML is pretty essential for XML. Can it be put in the XML >>>>>> writer? >>>> Any hints here would be welcome. The code in >>>> writers/html4css1/__init__.py is much different for the code for the XML >>>> writer. I can't figure out what code handles the writing of each element. >>> I see. As there is no set of "visit_*/depart_*" functions the XML writer, >>> the conversion from LaTeX-math to MathML would have to be done in a >>> "transform" (i.e. a doctree -> doctree mapping similar to section numbering >>> etc.). >> I had actually worked out an unacceptable hack in the >> docutils/writers/docutils.xml: > ... > >> However, I didn't think I wanted to change anything in the transform >> stage, because that changes the whole tree and would not be suitable >> for other transformations? > The writer can configure which transformations are performed. See for > example the html writer which adds > docutils.transforms.writer_aux.Admonitions to the standard writer > transforms by overloading the get_transforms() method:: > > def get_transforms(self): > return writers.Writer.get_transforms(self) + [writer_aux.Admonitions] > > For an introduction in the concept of transforms, see > docs/ref/transforms.html In short: > > * The parsing step is performed first (in one pass over the rst source) > and generates a doctree. > > * The transforms traverse the doctree and modify selected nodes > (sometimes also insert new ones) in several passes. This allows for > cross-linking, generating a toc, etc. > > * The writer does again a one-pass traversal over the doctree generating > the output document. > > > The idea would be to define a new transform > (transforms.math.latex2mathml, in a file > docutils/docutils/transforms/math.py say) that would > replace the content of math and math-block nodes. > > The code would be a mixture of examples from other transforms and the > visit_math() method in the html writer. (to avoid duplicating code, once it > is in place and tested, the html writer should be modified to use it as well) > >> For example, changing code in >> docutils/parsers/rst/directives changes the code when it comes time >> to transform the document to LaTeX, no? > A new transform will not change the LaTeX (etc) output unless it is also > added to the "transforms list" in the latex2e writer. (OTOH, changing > code in the parser affects all writers.) > > Following your directions, I created math.py in the docutils/transform directory. To the __init__ .py in writers, I added: from docutils.transforms import math ^^^^^^ added ... .. def get_transforms(self): return Component.get_transforms(self) + [ universal.Messages, universal.FilterMessages, universal.StripClassesAndElements, math.Math_Block, ^^^^ added ] math.py looks like this; """ math used by writers from docutils import writers from docutils.transforms import writers """ __docformat__ = 'reStructuredText' from docutils import nodes, utils, languages from docutils.transforms import Transform from docutils.math.latex2mathml import parse_latex_math class Math_Block(Transform): """ Change the text in the math_block from plain text in LaTeX to a MathML tree """ default_priority = 910 # not sure if this needs to be loaded earlier or not def apply(self): for math_block in self.document.traverse(nodes.math_block): math_code = math_block.astext() mathml_tree = parse_latex_math(math_code, inline=False) # need to append the mathml_tree to math_block I have a few questions. (1) How do you get just the text from a node.Element? In my code, the math_block.astext actually returns a text representation of the node, including the elements tags, etc. I looked everywhere in docutils/nodes.py for a method to get just text, but could not find one. Somehow, feeding the string with the tags to parse_latex_math worked anyway (following the example in the html writer). (2) How do I append the resulting tee to the math_block? I tried math_block.append() and other methods, but it seems the latext2mathml.py returns a different type of tree then that already created. I could convert the mathml tree to an XML string and then create a tree from that, and then append the tree? I'm just not sure how to do this. (3) How do I make this transformation optional, depending on an options by the user. The user might have put asciimath in the math_block element, in which case it should not be transformed by the latex2mathml.py modulel. Thanks Paul 
 Re: [Docutils-develop] bug in documentation (standard.txt)? From: Paul Tremblay - 2011-10-20 03:35:37 On 10/19/11 11:26 PM, David Goodger wrote: > On Wed, Oct 19, 2011 at 22:23, Paul Tremblay wrote: >> On 10/19/11 8:53 PM, David Goodger wrote: >>> On Wed, Oct 19, 2011 at 16:53, Paul Tremblay >>> wrote: >>>> On 10/19/11 4:38 PM, David Goodger wrote: >>>>> >>>>> >>>>> Footnotes may be numbered, either manually (as in >>>>> 1) or >>>>> automatically using a"#"-prefixed label. This footnote has a >>>>> label so it can be referred to from multiple places, both as a >>>>> footnote reference (>>>> refid="label">2) and as a>>>> name="hyperlink reference" refid="label">hyperlink >>>>> reference. >>>>> >>>>> >>>>> >>>>> y comes to mind, but still). >>>>> >>>>> The "[1]_" actually references *another* footnote, not the footnote >>>>> containing the reference itself. >>>> Thanks. That clears things up. >>>> >>>> Then I am processing the XML wrong. The element "footnote_reference" has >>>> the attribute "refid" of "label." The element "footnote" has the >>>> attribute >>>> "ids" of "label". I have been linking the "footnote_ref" to the >>>> "footnote" >>>> by these two attributes. >>> [xsl omitted. sorry, I don't speak xsl] >>> >>>> How are they linked? >>> Your understanding is right. Attribute refid links to attribute ids >>> (or one of them; there can be multiple, space-separated). I don't >>> understand your misunderstanding. >>> >> The parent footnote looks like this: >> >> >> >> >> The internal, child footnote reference looks like this: >> >> >> That's the "[#label]_" reference. > >> The internal footnote_reference refers back to its parent, or so it seems to >> me. In the last email I though you said that the internal footnote reference >> points to a different footnote. > Different reference: "[1]_". Go back and read that message again more carefully: > >>>>> The "[1]_" actually references *another* footnote, not the footnote >>>>> containing the reference itself. >> But if I look at the footnote_reference >> element, and get the refid of "label" and then follow that to the ids, I >> find the parent itself. In other words, the reference points to its own >> parent. > There are two references inside that footnote. We're talking across each other. > >> I notice, too, that earlier in the document, the following XML occurs: >> >> 2 > That's from much earlier in the document, a "[#label]_" reference > inside the first paragraph in the "Inline Markup" section. > >> So it seems that two footnote references point to the same footnote. > True. > >> This is >> further confirmed by the backrefs of the footnote element, which as a value >> of "id3 id10", corresponding to ids="id3" of the first footnote_reference, >> and ids="id10" to the second. > Also true. > > Let's look at more context: > > """ > .. [1] A footnote contains body elements, consistently indented by at > least 3 spaces. > > This is the footnote's second paragraph. > > .. [#label] Footnotes may be numbered, either manually (as in [1]_) or > automatically using a "#"-prefixed label. This footnote has a > label so it can be referred to from multiple places, both as a > footnote reference ([#label]_) and as a hyperlink reference__. > > __ label_ > """ > > The "[1]_" reference (in the second footnote, beginning ".. [#label]") > refers to another footnote, namely the first one, beginning ".. [1]". > > The second footnote reference in the second footnote, "[#label]_", > does refer to its parent footnote. > > There are two (2) footnote references above. Each refers to a > different footnote. > > Clear? > Thanks. Yes. Paul 
 Re: [Docutils-develop] bug in documentation (standard.txt)? From: David Goodger - 2011-10-20 03:26:43 On Wed, Oct 19, 2011 at 22:23, Paul Tremblay wrote: > On 10/19/11 8:53 PM, David Goodger wrote: >> >> On Wed, Oct 19, 2011 at 16:53, Paul Tremblay >>  wrote: >>> >>> On 10/19/11 4:38 PM, David Goodger wrote: >>>> >>>> >>>> >>>> Footnotes may be numbered, either manually (as in >>>> 1) or >>>> automatically using a"#"-prefixed label.  This footnote has a >>>> label so it can be referred to from multiple places, both as a >>>> footnote reference (>>> refid="label">2) and as a>>> name="hyperlink reference" refid="label">hyperlink >>>> reference. >>>> >>>> >>>> >>>> y comes to mind, but still). >>>> >>>> The "[1]_" actually references *another* footnote, not the footnote >>>> containing the reference itself. >>> >>> Thanks. That clears things up. >>> >>> Then I  am processing the XML wrong. The element "footnote_reference" has >>> the attribute "refid" of "label." The element "footnote" has the >>> attribute >>> "ids" of "label". I have been linking the "footnote_ref" to the >>> "footnote" >>> by these two attributes. >> >> [xsl omitted. sorry, I don't speak xsl] >> >>> How are they linked? >> >> Your understanding is right. Attribute refid links to attribute ids >> (or one of them; there can be multiple, space-separated). I don't >> understand your misunderstanding. >> > > The parent footnote looks like this: > > > > > The internal, child footnote reference looks like this: > > > The internal footnote_reference refers back to its parent, or so it seems to > me. In the last email I though you said that the internal footnote reference > points to a different footnote. Different reference: "[1]_". Go back and read that message again more carefully: >>>> The "[1]_" actually references *another* footnote, not the footnote >>>> containing the reference itself. > But if I look at the footnote_reference > element, and get the refid of "label" and then follow that to the ids, I > find the parent itself. In other words, the reference points to its own > parent. There are two references inside that footnote. We're talking across each other. > I notice, too, that earlier in the document, the following XML occurs: > > 2 That's from much earlier in the document, a "[#label]_" reference inside the first paragraph in the "Inline Markup" section. > So it seems that two footnote references point to the same footnote. True. > This is > further confirmed by the backrefs of the footnote element, which as a value > of "id3 id10", corresponding to ids="id3" of the first footnote_reference, > and ids="id10" to the second. Also true. Let's look at more context: """ .. [1] A footnote contains body elements, consistently indented by at least 3 spaces. This is the footnote's second paragraph. .. [#label] Footnotes may be numbered, either manually (as in [1]_) or automatically using a "#"-prefixed label. This footnote has a label so it can be referred to from multiple places, both as a footnote reference ([#label]_) and as a hyperlink reference__. __ label_ """ The "[1]_" reference (in the second footnote, beginning ".. [#label]") refers to another footnote, namely the first one, beginning ".. [1]". The second footnote reference in the second footnote, "[#label]_", does refer to its parent footnote. There are two (2) footnote references above. Each refers to a different footnote. Clear? -- David Goodger ; 
 Re: [Docutils-develop] bug in documentation (standard.txt)? From: Paul Tremblay - 2011-10-20 02:23:47 On 10/19/11 8:53 PM, David Goodger wrote: > On Wed, Oct 19, 2011 at 16:53, Paul Tremblay wrote: >> On 10/19/11 4:38 PM, David Goodger wrote: >>> >>> >>> Footnotes may be numbered, either manually (as in >>> 1) or >>> automatically using a"#"-prefixed label. This footnote has a >>> label so it can be referred to from multiple places, both as a >>> footnote reference (>> refid="label">2) and as a>> name="hyperlink reference" refid="label">hyperlink >>> reference. >>> >>> >>> >>> y comes to mind, but still). >>> >>> The "[1]_" actually references *another* footnote, not the footnote >>> containing the reference itself. >> Thanks. That clears things up. >> >> Then I am processing the XML wrong. The element "footnote_reference" has >> the attribute "refid" of "label." The element "footnote" has the attribute >> "ids" of "label". I have been linking the "footnote_ref" to the "footnote" >> by these two attributes. > [xsl omitted. sorry, I don't speak xsl] > >> How are they linked? > Your understanding is right. Attribute refid links to attribute ids > (or one of them; there can be multiple, space-separated). I don't > understand your misunderstanding. > The parent footnote looks like this: The internal, child footnote reference looks like this: 2 So it seems that two footnote references point to the same footnote. This is further confirmed by the backrefs of the footnote element, which as a value of "id3 id10", corresponding to ids="id3" of the first footnote_reference, and ids="id10" to the second. Paul 
 Re: [Docutils-develop] bug in documentation (standard.txt)? From: David Goodger - 2011-10-20 00:54:37 On Wed, Oct 19, 2011 at 16:53, Paul Tremblay wrote: > On 10/19/11 4:38 PM, David Goodger wrote: >> >> >> >> Footnotes may be numbered, either manually (as in >> 1) or >> automatically using a"#"-prefixed label.  This footnote has a >> label so it can be referred to from multiple places, both as a >> footnote reference (> refid="label">2) and as a> name="hyperlink reference" refid="label">hyperlink >> reference. >> >> >> >> y comes to mind, but still). >> >> The "[1]_" actually references *another* footnote, not the footnote >> containing the reference itself. > > Thanks. That clears things up. > > Then I  am processing the XML wrong. The element "footnote_reference" has > the attribute "refid" of "label." The element "footnote" has the attribute > "ids" of "label". I have been linking the "footnote_ref" to the "footnote" > by these two attributes. [xsl omitted. sorry, I don't speak xsl] > How are they linked? Your understanding is right. Attribute refid links to attribute ids (or one of them; there can be multiple, space-separated). I don't understand your misunderstanding. -- David Goodger ; 

