#199 Docutils incompatible with PyXML

closed-fixed
nobody
None
5
2014-08-15
2012-07-31
Toshio Kuratomi
No

When PyXML is installed, docutils fails its test suite. PyXML is likely at fault here as it is dead upstream and likely contains bugs that have been fixed in the python stdlib. Unfortunately, the python stdlib replaces its own xml module with the PyXML module if that module is installed. I'll attach a patch that causes python to prefer the stdlib python xml code over the PyXML code. Traceback follows (note: I've seen this traceback on the mailing list before... we just didn't have a reproducer at that time):

======================================================================
ERROR: test_invalid_raw_xml (test_writers.test_docutils_xml.DocutilsXMLTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/srv/git/python-docutils/docutils-0.10/test/test_writers/test_docutils_xml.py", line 181, in test_invalid_raw_xml
result = publish_xml(settings, invalid_raw_xml_source)
File "/srv/git/python-docutils/docutils-0.10/test/test_writers/test_docutils_xml.py", line 127, in publish_xml
settings_overrides=settings)
File "/srv/git/python-docutils/docutils-0.10/docutils/core.py", line 414, in publish_string
enable_exit_status=enable_exit_status)
File "/srv/git/python-docutils/docutils-0.10/docutils/core.py", line 662, in publish_programmatically
output = pub.publish(enable_exit_status=enable_exit_status)
File "/srv/git/python-docutils/docutils-0.10/docutils/core.py", line 219, in publish
output = self.writer.write(self.document, self.destination)
File "/srv/git/python-docutils/docutils-0.10/docutils/writers/__init__.py", line 80, in write
self.translate()
File "/srv/git/python-docutils/docutils-0.10/docutils/writers/docutils_xml.py", line 61, in translate
self.document.walkabout(visitor)
File "/srv/git/python-docutils/docutils-0.10/docutils/nodes.py", line 173, in walkabout
if child.walkabout(visitor):
File "/srv/git/python-docutils/docutils-0.10/docutils/nodes.py", line 165, in walkabout
visitor.dispatch_visit(self)
File "/srv/git/python-docutils/docutils-0.10/docutils/nodes.py", line 1611, in dispatch_visit
return method(node)
File "/srv/git/python-docutils/docutils-0.10/docutils/writers/docutils_xml.py", line 171, in visit_raw
col_num, line_num, node.astext())
TypeError: %d format: a number is required, not NoneType

======================================================================
ERROR: test_publish (test_writers.test_docutils_xml.DocutilsXMLTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/srv/git/python-docutils/docutils-0.10/test/test_writers/test_docutils_xml.py", line 155, in test_publish
result = publish_xml(settings, source)
File "/srv/git/python-docutils/docutils-0.10/test/test_writers/test_docutils_xml.py", line 127, in publish_xml
settings_overrides=settings)
File "/srv/git/python-docutils/docutils-0.10/docutils/core.py", line 414, in publish_string
enable_exit_status=enable_exit_status)
File "/srv/git/python-docutils/docutils-0.10/docutils/core.py", line 662, in publish_programmatically
output = pub.publish(enable_exit_status=enable_exit_status)
File "/srv/git/python-docutils/docutils-0.10/docutils/core.py", line 219, in publish
output = self.writer.write(self.document, self.destination)
File "/srv/git/python-docutils/docutils-0.10/docutils/writers/__init__.py", line 80, in write
self.translate()
File "/srv/git/python-docutils/docutils-0.10/docutils/writers/docutils_xml.py", line 60, in translate
self.visitor = visitor = self.translator_class(self.document)
File "/srv/git/python-docutils/docutils-0.10/docutils/writers/docutils_xml.py", line 110, in __init__
self.xmlparser.setContentHandler(self.the_handle)
File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py", line 128, in setContentHandler
self._reset_cont_handler()
File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py", line 234, in _reset_cont_handler
self._cont_handler.processingInstruction
AttributeError: 'NoneType' object has no attribute 'ProcessingInstructionHandler'

======================================================================
ERROR: test_publish_indents (test_writers.test_docutils_xml.DocutilsXMLTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/srv/git/python-docutils/docutils-0.10/test/test_writers/test_docutils_xml.py", line 161, in test_publish_indents
result = publish_xml(settings, source)
File "/srv/git/python-docutils/docutils-0.10/test/test_writers/test_docutils_xml.py", line 127, in publish_xml
settings_overrides=settings)
File "/srv/git/python-docutils/docutils-0.10/docutils/core.py", line 414, in publish_string
enable_exit_status=enable_exit_status)
File "/srv/git/python-docutils/docutils-0.10/docutils/core.py", line 662, in publish_programmatically
output = pub.publish(enable_exit_status=enable_exit_status)
File "/srv/git/python-docutils/docutils-0.10/docutils/core.py", line 219, in publish
output = self.writer.write(self.document, self.destination)
File "/srv/git/python-docutils/docutils-0.10/docutils/writers/__init__.py", line 80, in write
self.translate()
File "/srv/git/python-docutils/docutils-0.10/docutils/writers/docutils_xml.py", line 60, in translate
self.visitor = visitor = self.translator_class(self.document)
File "/srv/git/python-docutils/docutils-0.10/docutils/writers/docutils_xml.py", line 110, in __init__
self.xmlparser.setContentHandler(self.the_handle)
File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py", line 128, in setContentHandler
self._reset_cont_handler()
File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py", line 234, in _reset_cont_handler
self._cont_handler.processingInstruction
AttributeError: 'NoneType' object has no attribute 'ProcessingInstructionHandler'

======================================================================
ERROR: test_publish_newlines (test_writers.test_docutils_xml.DocutilsXMLTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/srv/git/python-docutils/docutils-0.10/test/test_writers/test_docutils_xml.py", line 167, in test_publish_newlines
result = publish_xml(settings, source)
File "/srv/git/python-docutils/docutils-0.10/test/test_writers/test_docutils_xml.py", line 127, in publish_xml
settings_overrides=settings)
File "/srv/git/python-docutils/docutils-0.10/docutils/core.py", line 414, in publish_string
enable_exit_status=enable_exit_status)
File "/srv/git/python-docutils/docutils-0.10/docutils/core.py", line 662, in publish_programmatically
output = pub.publish(enable_exit_status=enable_exit_status)
File "/srv/git/python-docutils/docutils-0.10/docutils/core.py", line 219, in publish
output = self.writer.write(self.document, self.destination)
File "/srv/git/python-docutils/docutils-0.10/docutils/writers/__init__.py", line 80, in write
self.translate()
File "/srv/git/python-docutils/docutils-0.10/docutils/writers/docutils_xml.py", line 60, in translate
self.visitor = visitor = self.translator_class(self.document)
File "/srv/git/python-docutils/docutils-0.10/docutils/writers/docutils_xml.py", line 110, in __init__
self.xmlparser.setContentHandler(self.the_handle)
File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py", line 128, in setContentHandler
self._reset_cont_handler()
File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py", line 234, in _reset_cont_handler
self._cont_handler.processingInstruction
AttributeError: 'NoneType' object has no attribute 'ProcessingInstructionHandler'

======================================================================
ERROR: test_raw_xml (test_writers.test_docutils_xml.DocutilsXMLTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/srv/git/python-docutils/docutils-0.10/test/test_writers/test_docutils_xml.py", line 172, in test_raw_xml
result = publish_xml(self.settings, raw_xml_source)
File "/srv/git/python-docutils/docutils-0.10/test/test_writers/test_docutils_xml.py", line 127, in publish_xml
settings_overrides=settings)
File "/srv/git/python-docutils/docutils-0.10/docutils/core.py", line 414, in publish_string
enable_exit_status=enable_exit_status)
File "/srv/git/python-docutils/docutils-0.10/docutils/core.py", line 662, in publish_programmatically
output = pub.publish(enable_exit_status=enable_exit_status)
File "/srv/git/python-docutils/docutils-0.10/docutils/core.py", line 219, in publish
output = self.writer.write(self.document, self.destination)
File "/srv/git/python-docutils/docutils-0.10/docutils/writers/__init__.py", line 80, in write
self.translate()
File "/srv/git/python-docutils/docutils-0.10/docutils/writers/docutils_xml.py", line 60, in translate
self.visitor = visitor = self.translator_class(self.document)
File "/srv/git/python-docutils/docutils-0.10/docutils/writers/docutils_xml.py", line 110, in __init__
self.xmlparser.setContentHandler(self.the_handle)
File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py", line 128, in setContentHandler
self._reset_cont_handler()
File "/usr/lib64/python2.7/site-packages/_xmlplus/sax/expatreader.py", line 234, in _reset_cont_handler
self._cont_handler.processingInstruction
AttributeError: 'NoneType' object has no attribute 'ProcessingInstructionHandler'

----------------------------------------------------------------------

Discussion

  • Patch to keep from using PyXML

     
  • Günter Milde
    Günter Milde
    2012-08-13

    Thanks for the diagnosis and the patch.

    I am hesitating to apply it to docutils/__init__.py, though.

    Is it sensible/required if PyXML is problematic and due to "vanish" anyway?

    Are you sure there are no side effects?

    Looking at the traceback, a more local patch to writers/docutils_xml.py should suffice.
    (The only other xml-using Docutils sub-module is nodes.py, loading xml.dom.minidom but all Tracebacks refer to the XML writer.)

     
  • > Is it sensible/required if PyXML is problematic and due to "vanish"
    anyway?

    If PyXML wasn't supposed to vanish one day, I'd just try to get the PyXML upstream to fix their code :-). PyXML is unfortunately still required by some other codebases (I'm trying to get rid of all uses of PyXML in Fedora but there's still a few upstreams that have not and seemingly have no plan to port (due to using things like xpath that aren't available in any form in the stdlib): https://fedoraproject.org/wiki/User:Toshio/Remove_PyXML ). Other times upstream documentation refers to a PyXML dependency even though the code would run fine with the stdlib version and that ends up with PyXML being installed on the user's system extraneously.

    So... some workaround seems necessary.

    > Are you sure there are no side effects?

    There could be side effects from this. What it does is reverse the order in which python searches for the modules. Instead of looking in PyXML first and the stdlib second, it looks in the stdlib first and PyXML second. So, for instance, the xpath stuff from PyXML will still be found since python will first look in the stdlib, not find it, and then look in PyXML.

    However, if there is code that depends on behaviour that is present in PyXML which is different in the stdlib there could be breakage. OTOH, it's a good opportunity to tell people to use the stdlib version instead of the PyXML version.

    Here's Nick Coghlan's post about it: http://lists.fedoraproject.org/pipermail/python-devel/2012-July/000406.html

    (And incidentally, he had one more revision to the patch mentioned in that post.)

    > Looking at the traceback, a more local patch to writers/docutils_xml.py should suffice.

    If you can find a more local or less intrusive fix that would be fine with me. I'm working on the other angle of getting people to port upstream code away from PyXML. That's likely to take years to resolve but seen in that light, whatever gets done in docutils is a workaround for a bug in other people's code.

     
  • Günter Milde
    Günter Milde
    2012-08-14

    Thanks for the clarifications.
    Could you try the following patch?

    --- docutils/writers/docutils_xml.py (Revision 7491)
    +++ docutils/writers/docutils_xml.py (Arbeitskopie)
    @@ -11,6 +11,19 @@
    __docformat__ = 'reStructuredText'

    import sys
    +
    +# Work around broken PyXML and obsolete python stdlib behaviour (The
    +# stdlib replaces its own xml module with the unmaintained PyXML if PyXML
    +# is installed.) Reverse the order in which xml module and submodules are
    +# searched to import stdlib modules if they exist and PyXML modules if they
    +# do not exist in the stdlib.
    +#
    +# See http://sourceforge.net/tracker/index.php?func=detail&aid=3552403&group_id=38414&atid=422030
    +# and http://lists.fedoraproject.org/pipermail/python-devel/2012-July/000406.html
    +import xml
    +if "_xmlplus" in xml.__path__[0]: # PyXML sub-module
    + xml.__path__.reverse() # If both are available, prefer stdlib over PyXML
    +
    import xml.sax.saxutils
    from StringIO import StringIO

     
  • Thanks. The patch works here as well.

     
  • Günter Milde
    Günter Milde
    2012-08-16

    • status: open --> closed-fixed