Note that there are different oem / win encoding in windows.
And there are many such encodfing (different regions). Wine probably knows how to work with them.
Probably p7zip developer doesn't think that this feature is too important, Or it can be difficult to implement.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So the best behavour AFAIK is to use OEM charset corresponding to currently selected locale (assuming that archive was created on system with the same locale selected) for all archives created on windows, and I see Far Manager for Linux and unzip tool doing it that way already.
Last edit: unxed 2016-10-05
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Here is a table of OEM charsets with corresponding posix locales (based on Wine sources). https://github.com/unxed/oemcp/blob/master/oemcp.txt
So we just need to get system locale, find corresponding OEM charset and use iconv to get correct local file names as windows does.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you unxed, this is a very good solution as it allows uncompressing Windows .zip archives in Linux without having to specify any special parameters.
And in the rare case where one needs to uncompress a .zip from a different locale, he could use e.g. LC_ALL=el_GR.UTF-8 7z x win10test.zip.
This issue affects many many users, they just report the problem against font-ends like MATE's engrampa instead of here. Please apply the patch, thank you very much!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Updated patch with new version, two bugs was fixed:
1) Forgot to zero-terminate converted utf-8 string
2) Now using LC_CTYPE instead of LC_ALL to fix locale detection in some cases
Last edit: unxed 2020-06-23
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
this is a mistake, built version was 16.02 also
btw, windows version of 7za 16.03 under wine handles my case2.zip correctly
and so does 16.02 under wine.
so the problem is seen only in linux build.
Last edit: unxed 2016-09-29
here is sample output from
unzip -l case2.zip
and
7za l case2.zip
for the same sample archive attached above
as you see, filename in archive differs. unzip's output is correct, 7za's is wrong.
It uses OEM (DOS) encoding.
p7zip doesn't support it.
will it ever be supported? most of windows-created .zips goes with OEM encoded filenames.
btw, 7z.exe 16.03 processes that archive ok under wine, why p7zip can not behave the same way?
Note that there are different oem / win encoding in windows.
And there are many such encodfing (different regions). Wine probably knows how to work with them.
Probably p7zip developer doesn't think that this feature is too important, Or it can be difficult to implement.
As for WIN (ANSI) encoding, according to the sources of Far Manager for Linux, it is only used with few legacy Packer Versions (ZipHeader.PackVer>20 && ZipHeader.PackVer<25). See:
https://github.com/elfmz/far2l/blob/master/multiarc/src/formats/zip/zip.cpp#L314
All other windows packers use OEM as defined in specification:
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
actually "IBM Code Page 437" is interpreted as "OEM code page currently selected" by most packers.
Linux packers mostly use UTF8.
So the best behavour AFAIK is to use OEM charset corresponding to currently selected locale (assuming that archive was created on system with the same locale selected) for all archives created on windows, and I see Far Manager for Linux and unzip tool doing it that way already.
Last edit: unxed 2016-10-05
Command line switch for specifying charset used to decode OEM-encoded filenames if archive is created on windows would also be a good option.
7-Zip supports it:
but p7zip probably ignores it.
Uploading sample of problem as https://2g0.ru/case2.zip is not working any more.
Here is a table of OEM charsets with corresponding posix locales (based on Wine sources).
https://github.com/unxed/oemcp/blob/master/oemcp.txt
So we just need to get system locale, find corresponding OEM charset and use iconv to get correct local file names as windows does.
@ipavlov here is a patch for CPP/7zip/Archive/Zip/ZipItem.cpp that fixes that bug.
System locale setting is used to select corresponding OEM code page to use. File/folder names coversion is done via iconv.
Last edit: unxed 2020-07-16
OEM code page table is made from wine sources, script is here https://github.com/unxed/oemcp
Thank you unxed, this is a very good solution as it allows uncompressing Windows .zip archives in Linux without having to specify any special parameters.
And in the rare case where one needs to uncompress a .zip from a different locale, he could use e.g.
LC_ALL=el_GR.UTF-8 7z x win10test.zip.This issue affects many many users, they just report the problem against font-ends like MATE's engrampa instead of here. Please apply the patch, thank you very much!
Updated patch with new version, two bugs was fixed:
1) Forgot to zero-terminate converted utf-8 string
2) Now using LC_CTYPE instead of LC_ALL to fix locale detection in some cases
Last edit: unxed 2020-06-23
I verify that the patch works fine in Ubuntu 20.04. I will upload patched p7zip debs in the Greek schools PPA in a couple of days.
Thanks again unxed!
I uploaded the packages in my PPA:
https://launchpad.net/~alkisg/+archive/ubuntu/ppa/+packages
I'll wait for feedback from the p7zip developers, so that I then try to push the official fixes in Debian and Ubuntu...
Updated patch file with actual version. Now supports archives created on Mac OS X. Further development will be done here:
https://github.com/unxed/oemcp/blob/master/p7zip_oemcp_ZipItem.cpp.patch
libnatspec can be used as alternative solution.
https://github.com/Etersoft/libnatspec
Have you tried 7zip? It has Linux version now, v23 for that matter, so this may be better option than here. P7zip 17.xx is somewhat maintained here.
Still encoding problems in some cases:
https://sourceforge.net/p/sevenzip/bugs/2473/