Menu

#43 Broken filenames when unzip archive with non-latin character

open-fixed
nobody
None
5
2024-06-08
2013-04-29
anton
No

When unzip archive with non-latin filenames created in Windows, unpacked
filenames would be broken.

Reproducible: Always

Steps to Reproduce:
1. Create zip archive in Windows with non-latin filenames inside.
2. Bring this archive to linux (opensuse)
3. Unzip archive with unzip.
Actual Results:
Unpacked files would look like unreadable mess.

Expected Results:
Unpacked files should have valid filenames like originally packed.

Could not find any way to manually provide encoding with options in unzip
command line or in gui tools like ark.

Related

Bugs: #43

Discussion

  • Steven Schweda

    Steven Schweda - 2013-04-29

    > 3. Unzip archive with unzip.

    As usual, knowing which version of UnZip might be helpful. A full
    "unzip -v" report would be best.

    > [...] or in gui tools like ark.

    Not our program.

     
  • anton

    anton - 2013-05-01

    > zip -v
    Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license.
    This is Zip 3.0 (July 5th 2008), by Info-ZIP.
    Currently maintained by E. Gordon. Please send bug reports to
    the authors using the web page at www.info-zip.org; see README for details.

    Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip,
    as of above date; see http://www.info-zip.org/ for other sites.

    Compiled with gcc 4.7.2 20130108 [gcc-4_7-branch revision 195012] for Unix (Linux ELF).

    Zip special compilation options:
    USE_EF_UT_TIME (store Universal Time)
    SYMLINK_SUPPORT (symbolic links supported)
    LARGE_FILE_SUPPORT (can read and write large files on file system)
    ZIP64_SUPPORT (use Zip64 to store large files in archives)
    UNICODE_SUPPORT (store and read UTF-8 Unicode paths)
    STORE_UNIX_UIDs_GIDs (store UID/GID sizes/values using new extra field)
    UIDGID_NOT_16BIT (old Unix 16-bit UID/GID extra field not used)
    [encryption, version 2.91 of 05 Jan 2007] (modified for Zip 3)

    Encryption notice:
    The encryption code of this program is not copyrighted and is
    put in the public domain. It was originally written in Europe
    and, to the best of our knowledge, can be freely distributed
    in both source and object forms from any country, including
    the USA under License Exception TSU of the U.S. Export
    Administration Regulations (section 740.13(e)) of 6 June 2002.

    Zip environment options:
    ZIP: [none]
    ZIPOPT: [none]

    from opensuse 12.3 distro

     
  • Steven Schweda

    Steven Schweda - 2013-05-01

    > > 3. Unzip archive with unzip.
    >
    > As usual, knowing which version of UnZip might be helpful. A full
    > "unzip -v" report would be best.

    > > zip -v
    > [...]

    What's wrong with this picture?

     
  • anton

    anton - 2013-05-01

    > unzip -v
    UnZip 6.00 of 20 April 2009, by Info-ZIP. Maintained by C. Spieler. Send
    bug reports using http://www.info-zip.org/zip-bug.html; see README for details.

    Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
    see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.

    Compiled with gcc 4.7.2 20130108 [gcc-4_7-branch revision 195012] for Unix (Linux ELF).

    UnZip special compilation options:
    COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
    SET_DIR_ATTRIB
    SYMLINKS (symbolic links supported, if RTL and file system permit)
    TIMESTAMP
    UNIXBACKUP
    USE_EF_UT_TIME
    USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
    USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
    UNICODE_SUPPORT [wide-chars, char coding: UTF-8] (handle UTF-8 paths)
    LARGE_FILE_SUPPORT (large files over 2 GiB supported)
    ZIP64_SUPPORT (archives using Zip64 for large files supported)
    VMS_TEXT_CONV
    WILD_STOP_AT_DIR
    [decryption, version 2.11 of 05 Jan 2007]

    UnZip and ZipInfo environment options:
    UNZIP: [none]
    UNZIPOPT: [none]
    ZIPINFO: [none]
    ZIPINFOOPT: [none]

     
  • Steven Schweda

    Steven Schweda - 2013-05-01

    Ok. This sounds like a problem which was reported here earlier:

    https://sourceforge.net/tracker/?func=detail&aid=3584238&group_id=118012&atid=679786

    We still have not released UnZip 6.10c (beta), but there's a newer
    experimental (internal-only) source kit which may solve this problem:

    http://antinode.info/ftp/info-zip/unzip610c08a_l_sM.zip

    As explained in that other thread, add "ICONV=1" to the usual "make"
    command to enable the character-set conversion features (options: -I,
    -O). (Look for "ICONV_MAPPING" in the "unzip -v" report.)

    Please let us know if this works or fails, or if you have any
    problems building it. And thanks for the report.

     
  • anton

    anton - 2013-05-01

    I have built source code:

    make -f unix/Makefile ICONV=1 generic

    Then unpacked my archive:

    /path/to/new/unzip/unzip -O cp866 HackDay25.zip

    (cp866 is for windows with Russian locale)

    And it worked great - I can see the unpacked filenames with the correct locale and do not have to recode them manually. I can't believe this problem is now fixed after many years :) Thank's

     
  • Steven Schweda

    Steven Schweda - 2013-05-01
    • status: open --> open-fixed
     
  • Steven Schweda

    Steven Schweda - 2013-05-01

    Thanks for the report. Glad (and relieved) to hear that the
    experimental code worked for you. We hope that the next UnZip beta kit
    (6.10c) will appear later this year, and it should work as well. Until
    then, please let us know if you have any problems with the experimental
    code.

     
  • Thomas Wolff

    Thomas Wolff - 2015-03-13

    (Affected characters are actually non-ASCII, including Latin characters, so e.g. ä which
    is contained in ISO-8859 Latin character sets; but I assume that's what the reporter meant).
    The issue is not really resolved, however;
    I reported this to http://www.info-zip.org/zip-bug.html:

    unzip 6.00 does not properly unpack non-ASCII characters (in filenames)
    from zip archives produced by various other tools, e.g. as provided by
    a dropbox zip download. It works fine with WinZip, pkzip, p7zip.
    I tried with unzip 6.10 as mentioned in
    https://sourceforge.net/p/infozip/bugs/43/
    on cygwin. Unpacking proper filename now works in a UTF-8 terminal
    (and locale), however not in another locale where again filenames
    are mangled. With p7zip, I can unpack the same archive in either locale
    environment and it works in any case.

     
  • Ed Gordon

    Ed Gordon - 2015-03-13

    <html><head><style type="text/css"></style></head><body>

    To recreate this, it would be useful to know the locale that the archive was made on,
    the tool used to create the archive (if zip, then "zip -v"), the locale the archive is
    being extracted on and the version of unzip ("unzip -v").  Note that any characters
    in the source locale not existing in the destination locale will get converted to
    something else, depending on the situation.  The actual source paths (including
    the characters being mangled) and the resulting mangled path would also be
    helpful.  (Or the character codes if you can't reproduce them.)

    There's multiple things going on and the below description is not
    sufficient to determine which is at play here.  I think I know what's going on,
    but I can't be sure without more detail.  If UTF-8 is involved, it should work.

    The UnZip guy, who provided the SF replies, may have further thoughts.

    Thanks,
    Ed Gordon

    -----Original Message-----

    From: Thomas Wolff thomaswolff@users.sf.net

    Sent: Mar 13, 2015 8:57 AM

    To: "[infozip:bugs] " <43@bugs.infozip.p.re.sf.net>

    Subject: [infozip:bugs] #43 Broken filenames when unzip archive with non-latin character



    (Affected characters are actually non-ASCII, including Latin characters, so e.g. ä which

    is contained in ISO-8859 Latin character sets; but I assume that's what the reporter meant).

    The issue is not really resolved, however;

    I reported this to http://www.info-zip.org/zip-bug.html:


    unzip 6.00 does not properly unpack non-ASCII characters (in filenames)

    from zip archives produced by various other tools, e.g. as provided by

    a dropbox zip download. It works fine with WinZip, pkzip, p7zip.

    I tried with unzip 6.10 as mentioned in

    https://sourceforge.net/p/infozip/bugs/43/

    on cygwin. Unpacking proper filename now works in a UTF-8 terminal

    (and locale), however not in another locale where again filenames

    are mangled. With p7zip, I can unpack the same archive in either locale

    environment and it works in any case.




    [bugs:#43] Broken filenames when unzip archive with non-latin character


    Status: open-fixed

    Group:

    Created: Mon Apr 29, 2013 12:41 PM UTC by anton

    Last Updated: Wed May 01, 2013 07:44 PM UTC

    Owner: nobody


    When unzip archive with non-latin filenames created in Windows, unpacked

    filenames would be broken.


    Reproducible: Always


    Steps to Reproduce:

    1. Create zip archive in Windows with non-latin filenames inside.

    2. Bring this archive to linux (opensuse)

    3. Unzip archive with unzip.

    Actual Results:

    Unpacked files would look like unreadable mess.


    Expected Results:

    Unpacked files should have valid filenames like originally packed.


    Could not find any way to manually provide encoding with options in unzip

    command line or in gui tools like ark.




    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/infozip/bugs/43/


    To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

    /thomaswolff@users.sf.net
    </body></html>

     

    Related

    Bugs: #43

  • Steven Schweda

    Steven Schweda - 2015-03-13

    Thanks for the report. But...

    [...] zip archives produced by various other tools, [...]

    An actual example would be more helpful than a vague description.

    I tried with unzip 6.10 as mentioned in
    https://sourceforge.net/p/infozip/bugs/43/

    UnZip 6.10 does not yet exist. Exactly what did you try? There are
    newer unreleased kits than unzip610c08a_l_sM.zip. I would not expect
    them to help much in this case, but there's only one way to find out.
    Let me know if you're interested.

    [...] on cygwin. Unpacking proper filename now works in a UTF-8
    terminal (and locale), however not in another locale where again
    filenames are mangled.

    I do very little on Windows, and even less in Cygwin, and still less
    in any locale other than "C", so if you expect me to investigate a
    problem like this, then I'll need an actual (small) test archive, and a
    complete set of directions explaining how to reproduce the problem.

     
  • unxed

    unxed - 2024-05-22

    The built-in .zip archiver in older versions of Windows used DOS (OEM) or Windows (ANSI) code page corresponding to current regional settings for new archives. Lots of such archives still exist.

    The correct behavior is to determine the relevant OEM or ANSI code page based on the system locale and use it. You can look at this PR for reference implementation:

    https://github.com/p7zip-project/p7zip/pull/232

    Sample archive showing this bug attached.

     

    Last edit: unxed 2024-05-22
    • Paul Marquess

      Paul Marquess - 2024-05-22

      What encoding is used for the filenames in Desktop.zip?

       
    • Paul Marquess

      Paul Marquess - 2024-05-22

      It looks like codepage 866, i.e. Cyrillic

       
  • Paul Marquess

    Paul Marquess - 2024-05-22

    There is a patched version of unzip that some of the Linux distros ship that allows the codepage/encoding to be specified. That deals with this use-case.

    In this case, , specifying codepage 866 with the -O option does the trick

    $ unzip -l  -O cp866 Desktop.zip 
    Archive:  Desktop.zip
      Length      Date    Time    Name
    ---------  ---------- -----   ----
            0  09-28-2016 18:41   Новая папка/
            4  09-28-2016 18:40   Новый текстовый документ.txt
    ---------                     -------
            4                     2 files
    
     
  • unxed

    unxed - 2024-05-22

    Debian's 7zip just merged a fix for the same problem:

    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=779207#51

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.