#228 parser selection does not shield against module name clashes

repository
closed-fixed
nobody
None
5
2013-04-18
2013-03-05
Johannes Wienke
No

The method get_parser_class(parser_name) in docutils/parsers/__init__.py does not shield against python modules which might mirror names of internal docutils parsers. E.g. if you installa a top-level module called "rst" in your pythonpath, get_parser_class picks up that module instead of docutils.parsers.rst. In my case this breaks building sphinx documentation. An absolute path for the parser module should be used, e.g. "docutils.parsers.rst" instead of just "rst" as the string for resolving the module import.

Here is the output of that method from the shpinx context where I patched in a print statement to print the the imported module.

<module 'rst' from '/home/languitar/lib/python2.7/site-packages/rst-0.0.0-py2.7.egg/rst/__init__.pyc'>

Exception occurred:
File "/usr/lib/python2.7/site-packages/docutils/parsers/__init__.py", line 55, in get_parser_class
return module.Parser
AttributeError: 'module' object has no attribute 'Parser'

Discussion

  • Dale
    Dale
    2013-03-13

    I'm hitting this same problem, also via Sphinx, except with the reader instead of the parser: Sphinx tries to use the string "standalone" as the reader_name to Publisher.set_components, and I happen to have a top-level module called "standalone", so docutils.readers.get_reader_class finds my "standalone" module, and then throws an AttributeError when it can't find standalone.Reader.

    I'm not certain what to do about this, but I'll throw out a few ideas in case it helps.

    Ignoring backwards-compatibility, perhaps the ideal solution would be to require passing a fully-qualified module name to the get_*_class functions, and thus also to Publisher. Then you could simplify the get_*_class methods to something more like (demonstrating with get_reader_class):

    __import__(reader_name, level=0)
    return sys.modules[reader_name].Reader
    

    Of course, requiring a fully-qualified module name changes the API for Publisher, which may not be acceptable.

    Perhaps the best way to maintain compatibility with the current API would be to continue to accept the relative module names for the readers that are currently included with docutils, but require fully-qualified for (reader|parser|writer)_name going forward? Something like:

    if reader_name in ("doctree", "pep", "standalone"):
        reader_name = "docutils.readers." + reader_name
    __import__(reader_name, level=0)
    return sys.modules[reader_name].Reader
    

    For maximum backwards compatibility, if you think someone has been passing in "fully.qualified.name" to the current code in order to specify fully.Reader (which seems crazy, but would work with the current code as far as I can see), I guess:

    if reader_name in ("doctree", "pep", "standalone"):
        reader_name = "docutils.readers." + reader_name
    module = __import__(reader_name, level=0)
    try:
        return module.Reader
    except AttributeError:
        return sys.modules[reader_name].Reader
    

    If someone likes one of these solutions I could submit a patch.

     
    Last edit: Dale 2013-03-14
  • Günter Milde
    Günter Milde
    2013-03-22

    The current behaviour is the result of a fix of
    https://sourceforge.net/p/docutils/bugs/

    Users rely on beeing able to import custom modules for readers, parsers, and writers.

    How about changing the search order:

    try:
        module = __import__(parser_name, globals(), locals(), level=1)
    except ImportError:
        module = __import__(parser_name, globals(), locals(), level=0)
    

    so that with equal names the local module has preference but users that know what to do can still substitute readers/parsers/writers with custom ones in the Pythonpath - but only if they use a different name.

     
    • Dale
      Dale
      2013-03-23

      How about changing the search order:

      Sounds good to me, much simpler than my ideas. I tested it here and this solves my problem. Thanks!

       
    • I think this is definitely an improvement to the current behavior but it will not suffice in any situation. E.g. replacing an existing parser with a patched version using the same name will probably not work out this way. But to also get this working a more serious refactoring is probably necessary so that there is a declared way of specifying parsers instead of relying on import.

       
  • Günter Milde
    Günter Milde
    2013-04-08

    get_parser_class() is used in set_parser() which is used in set_components()
    which is used in the publish_*() convenience functions. For special cases that cannot be solved in a convenient way, the Publisher class can be used directly. Instantiationg a Publisher class, you can hand it reader, writer, and parser objects.
    Also the Reader component accepts a parser object instead of a parser_name, so the API for fine-grained customization and edge cases is there.

     
    • Ok, then I am perfectly fine with this change.

       
  • Günter Milde
    Günter Milde
    2013-04-09

    • status: open --> closed-fixed
    • milestone: --> repository
     
  • Where should we look for this fix? It doesn't seem to be in SVN trunk.

     
  • Günter Milde
    Günter Milde
    2013-04-17

    Sorry, I forgot to commit. It is in now.
    Make sure to use the new repository URI
    (see https://sourceforge.net/p/docutils/code/HEAD/tree/)

     
    • Dale
      Dale
      2013-04-17

      Thank you Günter. Unfortunately it looks like r7646 only changed only docutils.parsers.get_parser_class. Would you mind making an equivalent change in docutils.readers.get_reader_class, and probably docutils.writers.get_writer_class too? I'm actually having a problem with readers, but I imagine someone might have an equivalent problem with writers some day, and anyway I imagine you want to keep all three of these functions parallel with each other.

      Thanks again.

       
  • Günter Milde
    Günter Milde
    2013-04-18

    r7647 should change the import search order for the remaining parts (and also language modules). Thanks for pointing this out.