Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#181 docutils 0.8.1 throws unicode error on non-ascii cwd

closed-fixed
nobody
None
5
2012-02-03
2012-01-31
Toshio Kuratomi
No

I received the following bug from our package of docutils: https://bugzilla.redhat.com/show_bug.cgi?id=785622

Reproduced by doing the following:

mkdir /var/tmp/café
cd /var/tmp/café
touch test.rst default.css
rst2html --stylesheet-path=default.css test.rst

The problem is that on python2 we're combining a unicode string (from command line parsing) with a byte str (from the current working directory). I'll attach a patch that addresses this in a minimally invasive manner. However, I think that there are still latent bugs in the code as on unix systems, filenames are byte str with a few bytes marked as illegal. That means that a user could have a current working directory where part of the path was encoded in latin-1 and part in utf-8, for instance. This may throw an exception when converting it to unicode or it may return a unicode string that doesn't actually represent the bytes that are on disk (and so will fail to find the file when it attempts to read it). Fixing that would require a re-architecting of the file handling in all of docutils, though.

Discussion

  • Patch to fix unicode error when cwd is non-ascii

     
  • Günter Milde
    Günter Milde
    2012-02-01

    another minimal patch for non-ascii cwd

     
    Attachments
  • Günter Milde
    Günter Milde
    2012-02-01

    The Patch download link returns (for me) an error-page, instead of a file, so I paste my version of a patch here.

    The change in frontend.py fixes the reported case by use of unicode strings for the cwd.
    The change in utils/__init__.py lets utils.relative_path() work for the case source=None, target=u'unicode'.

    Users with mixed encodings should try Python 3 (>= 3.1) which introduces the "surrogateescape" encoding error handler to deal with undecodable bytes in paths.

    Index: utils/__init__.py

    --- utils/__init__.py (Revision 7326)
    +++ utils/__init__.py (Arbeitskopie)
    @@ -457,7 +457,8 @@

    If there is no common prefix, return the absolute path to `target`.
    """
    - source_parts = os.path.abspath(source or 'dummy_file').split(os.sep)
    + source_parts = os.path.abspath(source or type(target)('dummy_file')
    + ).split(os.sep)
    target_parts = os.path.abspath(target).split(os.sep)
    # Check first 2 parts because '/dir'.split('/') == ['', 'dir']:
    if source_parts[:2] != target_parts[:2]:
    Index: frontend.py
    ===================================================================
    --- frontend.py (Revision 7326)
    +++ frontend.py (Arbeitskopie)
    @@ -184,7 +184,7 @@
    `OptionParser.relative_path_settings`.
    """
    if base_path is None:
    - base_path = os.getcwd()
    + base_path = os.getcwdu()
    for key in keys:
    if key in pathdict:
    value = pathdict[key]
    @@ -619,7 +619,7 @@
    """Store positional arguments as runtime settings."""
    values._source, values._destination = self.check_args(args)
    make_paths_absolute(values.__dict__, self.relative_path_settings,
    - os.getcwd())
    + os.getcwdu())
    values._config_files = self.config_files
    return values

     
  • Günter Milde
    Günter Milde
    2012-02-03

    • status: open --> closed-fixed