#113 writers/odf_odt: Use only ASCII filenames in ODF packages

None
open
nobody
None
5
2015-02-11
2013-08-05
Michael Schutte
No

The odf_odt writer embeds images in its output files and uses the original filenames as part of the embedded filenames. Since the OpenDocument standard does not specify the filename charset, recode to ASCII (dropping non-representable characters) to be on the safe side.

The actual reason that brought about this patch is an invalid assumption about character sets in docutils.writers.odf_odt.Writer.store_embedded_files(). This has been reported as Debian bug http://bugs.debian.org/714317.

1 Attachments

Discussion

  • the patch does two things. first

    remove decode('latin-1').encode('utf-8')
    the filename stored in zipfile.

    seams good to me. as the filename refererenced should not be
    changed and encoding/decoding should have happened in docutils.io anyway

    APPLIED in revision 7786

     
  • second::

    def visit_image(self, node):

    @@ -2076,7 +2075,8 @@
    else:
    self.image_count += 1
    filename = os.path.split(source)[1]
    - destination = 'Pictures/1%08x%s' % (self.image_count, filename, )
    + destination = 'Pictures/1%08x_%s' % (self.image_count,
    + filename.encode("ascii", "ignore"))
    if source.startswith('http:'):
    try:

    i do not see why the first part removes encode and the second adds ?

    NOT APPLIED