#155 handling on non-ascii filenames is broken again

closed-fixed
nobody
None
5
2010-12-20
2010-12-18
No

Hello,

After recent commit "Decode command line arguments with the locales preferred encoding." [1] handling of non-ascii filenames produces traceback:

$ echo $LANG
ru_RU.utf8
$ echo Hello > привет.txt
$ rst2xetex --traceback привет.txt привет.tex # same for rst2html
Traceback (most recent call last):
File "/home/kirr/src/tools/txt/docutils/docutils/tools/rst2xetex", line 27, in <module>
publish_cmdline(writer_name='xetex', description=description)
File "/home/kirr/src/tools/txt/docutils/docutils/docutils/core.py", line 337, in publish_cmdline
config_section=config_section, enable_exit_status=enable_exit_status)
File "/home/kirr/src/tools/txt/docutils/docutils/docutils/core.py", line 209, in publish
self.settings)
File "/home/kirr/src/tools/txt/docutils/docutils/docutils/readers/__init__.py", line 69, in read
self.parse()
File "/home/kirr/src/tools/txt/docutils/docutils/docutils/readers/__init__.py", line 74, in parse
self.document = document = self.new_document()
File "/home/kirr/src/tools/txt/docutils/docutils/docutils/readers/__init__.py", line 80, in new_document
document = utils.new_document(self.source.source_path, self.settings)
File "/home/kirr/src/tools/txt/docutils/docutils/docutils/utils.py", line 449, in new_document
source_path = decode_path(source_path)
File "/home/kirr/src/tools/txt/docutils/docutils/docutils/utils.py", line 356, in decode_path
path = path.decode(sys.getfilesystemencoding(), 'strict')
File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

As I see it, the problem is that in decode_path() we already have path as unicode string and decoding raises.

Following tweak works for me, but I doubt this is the right one, because if we already convert command line arguments to unicode, decode_path() seems to be not needed at all.

--- a/docutils/docutils/utils.py
+++ b/docutils/docutils/utils.py
@@ -351,6 +351,9 @@ def decode_path(path):
Convert to Unicode without the UnicodeDecode error of the
implicit 'ascii:strict' decoding.
"""
+ if isinstance(path, unicode):
+ return path
+
# see also http://article.gmane.org/gmane.text.docutils.user/2905
try:
path = path.decode(sys.getfilesystemencoding(), 'strict')

Thanks,
Kirill

[1] http://repo.or.cz/w/docutils.git/commitdiff/3bb3b62e4256fae6efb9cc58edbda5d17677422c

Discussion

  • Günter Milde

    Günter Milde - 2010-12-20
    • status: open --> closed-fixed
     

Log in to post a comment.