#153 Image directive URI handled incorrectly in ODT output

open
Dave Kuhlman
ODT Writer (10)
5
2010-12-03
2010-12-03
GCompton
No

The ODT writer has two problems with the way it handles the URIs provided in image (and figure) directives. First, it does not unescape them before passing them to the local file system. Second, it dereferences relative to the current working directory, not relative to the directory reST file.

The first case causes problems with images whose file names contain characters that can't be used in URIs without escaping. I ran into it because my files have spaces in their names. In order to get the HTML writer to work, I had to refer to the files using "%20" in place of each space. Not ideal, but acceptable. However, the ODT writer takes the image node's uri member and passes it directly to os.path.exists (In writers/odt_odt/__init__.py in the function check_file_exists) as though it were a file system path rather than a URI. Of course, on the file system, the file name has spaces, not %20s, so it couldn't find it. This part of the problem could probably be solved by passing the uri through the standard urllib.url2pathname function before calling any file system functions.

The second problem showed up when I tried to perform the generation with a current directory other than the one containing the reST file. In general, relative URIs are resolved against the directory containing the referring document (I couldn't find anything in the reST documentation declaring an exception, so I expect the common behavior). But, again, because the ODT writer doesn't modify the URI at all before using it as a path, Python's file system functions assume it is relative to the working directory. The solution here is probably to prepend the path from the working directory to the reST file's directory.

These problems may be more general than what I've seen so far. The other writers may be affected. Also, the other directives cause external files to be read may have similar problems.

I'm using docutils 0.7 with Python 2.7 32-bit running on Windows 7 Professional 64-bit.

Discussion

  • GCompton
    GCompton
    2010-12-03

    Image file

     
    Attachments
  • GCompton
    GCompton
    2010-12-03

    Minimal example

     
    Attachments
  • GCompton
    GCompton
    2010-12-03

    I submitted a little too quickly--before I finished uploading my minimal example.

    Running: rst2od.py "blue square.rst" "blue square.odt"

    Outputs "blue square.rst:: (WARNING/2) Cannot find image file blue%20square.png." and generates an empty document. The expected behavior would be to find "blue square.png" and embed that in the document.

     
  • Dave Kuhlman
    Dave Kuhlman
    2010-12-10

    I've committed a fix to the SVN repository.

    I believe this fixes both problems:

    (1) It treats the "file name" for the .. image:: directive as a URI and unquotes it as the original poster suggested. Thanks to Guenter Milde for explaining this.

    (2) If the URI does not begin with an os.sep (i.e. it is not an absolute path), then we pre-pend the path to the current document to the URI/path.

    But, now that we've decided that the image directive takes a URI, I suppose it should be possible to specify the URL of an image on the Web, for example::

    .. image:: http://www.xxx.com/myimage.png

    I'll try to address that one next week.

     
  • GCompton
    GCompton
    2010-12-10

    Sorry if I wasn't clear on the URI vs local file path distinction. I learned more than I ever wanted to know about the URI standards by fixing similar problems in my own software, so I tend to forget it's pretty obscure.

    Anyway, looks like you figured it out. I confirm that both problems appear to be fixed. I checked out http://svn.berlios.de/svnroot/repos/docutils/trunk/docutils rev 6500. I used our original document, rather than the minimal example. The resulting ODT file contained the images even when they had URI-escaped characters and when the document was not in the working directory. As a caveat, I didn't review the code changes (and probably won't, unless you need me to).

    If you decide to add generalized URL retrieval, you may want to look at the standard "urllib" module, especially the urlopen function. I haven't used it myself, but it sounds like it may do all the work for you. All our documents reference only local file system images, so we won't use this capability even if supported, but it might be of value to others.