When unzip archive with non-latin filenames created in Windows, unpacked
filenames would be broken.
Reproducible: Always
Steps to Reproduce:
1. Create zip archive in Windows with non-latin filenames inside.
2. Bring this archive to linux (opensuse)
3. Unzip archive with unzip.
Actual Results:
Unpacked files would look like unreadable mess.
Expected Results:
Unpacked files should have valid filenames like originally packed.
Could not find any way to manually provide encoding with options in unzip
command line or in gui tools like ark.
> 3. Unzip archive with unzip.
As usual, knowing which version of UnZip might be helpful. A full
"unzip -v" report would be best.
> [...] or in gui tools like ark.
Not our program.
> zip -v
Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license.
This is Zip 3.0 (July 5th 2008), by Info-ZIP.
Currently maintained by E. Gordon. Please send bug reports to
the authors using the web page at www.info-zip.org; see README for details.
Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip,
as of above date; see http://www.info-zip.org/ for other sites.
Compiled with gcc 4.7.2 20130108 [gcc-4_7-branch revision 195012] for Unix (Linux ELF).
Zip special compilation options:
USE_EF_UT_TIME (store Universal Time)
SYMLINK_SUPPORT (symbolic links supported)
LARGE_FILE_SUPPORT (can read and write large files on file system)
ZIP64_SUPPORT (use Zip64 to store large files in archives)
UNICODE_SUPPORT (store and read UTF-8 Unicode paths)
STORE_UNIX_UIDs_GIDs (store UID/GID sizes/values using new extra field)
UIDGID_NOT_16BIT (old Unix 16-bit UID/GID extra field not used)
[encryption, version 2.91 of 05 Jan 2007] (modified for Zip 3)
Encryption notice:
The encryption code of this program is not copyrighted and is
put in the public domain. It was originally written in Europe
and, to the best of our knowledge, can be freely distributed
in both source and object forms from any country, including
the USA under License Exception TSU of the U.S. Export
Administration Regulations (section 740.13(e)) of 6 June 2002.
Zip environment options:
ZIP: [none]
ZIPOPT: [none]
from opensuse 12.3 distro
> > 3. Unzip archive with unzip.
>
> As usual, knowing which version of UnZip might be helpful. A full
> "unzip -v" report would be best.
> > zip -v
> [...]
What's wrong with this picture?
> unzip -v
UnZip 6.00 of 20 April 2009, by Info-ZIP. Maintained by C. Spieler. Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.
Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.
Compiled with gcc 4.7.2 20130108 [gcc-4_7-branch revision 195012] for Unix (Linux ELF).
UnZip special compilation options:
COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
SET_DIR_ATTRIB
SYMLINKS (symbolic links supported, if RTL and file system permit)
TIMESTAMP
UNIXBACKUP
USE_EF_UT_TIME
USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
UNICODE_SUPPORT [wide-chars, char coding: UTF-8] (handle UTF-8 paths)
LARGE_FILE_SUPPORT (large files over 2 GiB supported)
ZIP64_SUPPORT (archives using Zip64 for large files supported)
VMS_TEXT_CONV
WILD_STOP_AT_DIR
[decryption, version 2.11 of 05 Jan 2007]
UnZip and ZipInfo environment options:
UNZIP: [none]
UNZIPOPT: [none]
ZIPINFO: [none]
ZIPINFOOPT: [none]
Ok. This sounds like a problem which was reported here earlier:
https://sourceforge.net/tracker/?func=detail&aid=3584238&group_id=118012&atid=679786
We still have not released UnZip 6.10c (beta), but there's a newer
experimental (internal-only) source kit which may solve this problem:
http://antinode.info/ftp/info-zip/unzip610c08a_l_sM.zip
As explained in that other thread, add "ICONV=1" to the usual "make"
command to enable the character-set conversion features (options: -I,
-O). (Look for "ICONV_MAPPING" in the "unzip -v" report.)
Please let us know if this works or fails, or if you have any
problems building it. And thanks for the report.
I have built source code:
make -f unix/Makefile ICONV=1 generic
Then unpacked my archive:
/path/to/new/unzip/unzip -O cp866 HackDay25.zip
(cp866 is for windows with Russian locale)
And it worked great - I can see the unpacked filenames with the correct locale and do not have to recode them manually. I can't believe this problem is now fixed after many years :) Thank's
Thanks for the report. Glad (and relieved) to hear that the
experimental code worked for you. We hope that the next UnZip beta kit
(6.10c) will appear later this year, and it should work as well. Until
then, please let us know if you have any problems with the experimental
code.
(Affected characters are actually non-ASCII, including Latin characters, so e.g. ä which
is contained in ISO-8859 Latin character sets; but I assume that's what the reporter meant).
The issue is not really resolved, however;
I reported this to http://www.info-zip.org/zip-bug.html:
unzip 6.00 does not properly unpack non-ASCII characters (in filenames)
from zip archives produced by various other tools, e.g. as provided by
a dropbox zip download. It works fine with WinZip, pkzip, p7zip.
I tried with unzip 6.10 as mentioned in
https://sourceforge.net/p/infozip/bugs/43/
on cygwin. Unpacking proper filename now works in a UTF-8 terminal
(and locale), however not in another locale where again filenames
are mangled. With p7zip, I can unpack the same archive in either locale
environment and it works in any case.
<html><head><style type="text/css"></style></head><body>
the tool used to create the archive (if zip, then "zip -v"), the locale the archive is
being extracted on and the version of unzip ("unzip -v"). Note that any characters
in the source locale not existing in the destination locale will get converted to
something else, depending on the situation. The actual source paths (including
the characters being mangled) and the resulting mangled path would also be
helpful. (Or the character codes if you can't reproduce them.)
There's multiple things going on and the below description is not
sufficient to determine which is at play here. I think I know what's going on,
but I can't be sure without more detail. If UTF-8 is involved, it should work.
The UnZip guy, who provided the SF replies, may have further thoughts.
Thanks,
Ed Gordon
Related
Bugs: #43
Thanks for the report. But...
An actual example would be more helpful than a vague description.
UnZip 6.10 does not yet exist. Exactly what did you try? There are
newer unreleased kits than unzip610c08a_l_sM.zip. I would not expect
them to help much in this case, but there's only one way to find out.
Let me know if you're interested.
I do very little on Windows, and even less in Cygwin, and still less
in any locale other than "C", so if you expect me to investigate a
problem like this, then I'll need an actual (small) test archive, and a
complete set of directions explaining how to reproduce the problem.
The built-in .zip archiver in older versions of Windows used DOS (OEM) or Windows (ANSI) code page corresponding to current regional settings for new archives. Lots of such archives still exist.
The correct behavior is to determine the relevant OEM or ANSI code page based on the system locale and use it. You can look at this PR for reference implementation:
https://github.com/p7zip-project/p7zip/pull/232
Sample archive showing this bug attached.
Last edit: unxed 2024-05-22
What encoding is used for the filenames in
Desktop.zip
?It looks like codepage 866, i.e. Cyrillic
There is a patched version of
unzip
that some of the Linux distros ship that allows the codepage/encoding to be specified. That deals with this use-case.In this case, , specifying codepage 866 with the
-O
option does the trickDebian's 7zip just merged a fix for the same problem:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=779207#51
Similar fix suggested for Ubuntu's unzip:
https://code.launchpad.net/~mitya57/ubuntu/+source/unzip/+git/unzip/+merge/466860