A directory named as unsupported language confuses docutils when used as library.
For example, tr directory confuses docutils.languages.get_language(). As a result, it returns invalid language module as a return value.
root@31801945b741:/docs# python
Python 3.5.9 (default, Nov 23 2019, 07:17:24)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import docutils.languages
>>> docutils.languages.get_language('tr')
<module 'docutils.languages.en' from '/usr/local/lib/python3.5/site-packages/docutils/languages/en.py'>
>>> # => returns "en" language
...
>>> docutils.languages._languages.clear() # clear language-cache forcedly
>>> os.mkdir('tr')
>>> docutils.languages.get_language('tr')
<module 'tr' (namespace)>
>>> # => returns invalid language module if directory named same as language found
...
In real problem, a Sphinx user reports their project has been crashed on building document when document has bibliographic fields and tr directory. IMO, get_language() should confirm the loaded module is a language module.
https://github.com/sphinx-doc/sphinx/issues/6931
I made a patch for such case. What do you think abou this?
thinking ...
is
if hasattr(module, '__docformat__'):the correct testor in your case
if hasattr(module, 'bibliographic_fields'):or ?
The problem is due to the undocumented (but tested) loading of 3rd-party/local language modules from the working directory if Docutils misses a language module.
Yes, there should be a test that the local module is a Docutils-compatible language module.
This is required for both, languages.get_language() and parsers.rst.languages.get_language().
The attached patch test for the type of the required objects and also replaces
the low-level
__import__()function with importlib.import() (new in 2.7).As this has the potential to break installations that rely on details of the current implementation
for local language modules, it is stuff for 0.17b.
Patch (for Git).
-1 on the patch as-is, because it uses "assert" statements. This is a problem as "assert" statements can be disabled via the "python -O" optimization option. This would remove the checks and defeat the purpose of the change.
Thank you for the feedback.
How about the new version without "assert" statements?
The "check_content" methods look good.
The
__call__method has a small problem: no need to check the cache here (smells like a premature optimization), as it's done in "import_from_packages". If there's no cache hit, it will be checked twice.Perhaps move the docstrings from after the
get_language = ...assignments into the LanguageImporter & RstLanguageImporter classes? Then the docstrings may be useful in running code.Otherwise, looks good, thanks!
Attached is the 3d iteration:
* cache is only used from the outer function (call),
* Report missing/incomplete modules and language-variant substitutions (info).
* Move documentation from instance to class definition.
When adding tests (see revision 8452), I realized that parser INFO messages (about using the English fallback) are part of the "pseudoXML" output when the test is run stand-alone but not when running the test suite. (To reproduce, run e.g.
python test_admonitions_dummy_lang.pyin/usr/local/src/docutils-git-svn/docutils/test/test_parsers/test_rst/test_directives/).Sorry for not responding. LGTM!
Fixed in 0.17b.dev (the repository version).
Fixed in Docutils 0.17.
Thanks again for your contribution.