Menu

#187 7za lists files (of extract files) using incorrect encoding in some cases

v1.0 (example)
open
nobody
None
5
2024-05-18
2016-09-29
unxed
No

7za lists files (of extract files) using incorrect encoding in some cases.
example: 2g0.ru/case2.zip
btw, unzip 6.00 handles this archive normally
tested 7-Zip (a) [64] 16.02 (from debian) and 7-Zip (a) [64] 16.03 built by myself

Discussion

  • unxed

    unxed - 2016-09-29

    and 7-Zip (a) [64] 16.03 built by myself

    this is a mistake, built version was 16.02 also

     
  • unxed

    unxed - 2016-09-29

    btw, windows version of 7za 16.03 under wine handles my case2.zip correctly
    and so does 16.02 under wine.
    so the problem is seen only in linux build.

     

    Last edit: unxed 2016-09-29
  • unxed

    unxed - 2016-09-29

    here is sample output from

    unzip -l case2.zip
    and
    7za l case2.zip
    for the same sample archive attached above

    as you see, filename in archive differs. unzip's output is correct, 7za's is wrong.

     
  • Igor Pavlov

    Igor Pavlov - 2016-10-01

    It uses OEM (DOS) encoding.
    p7zip doesn't support it.

     
  • unxed

    unxed - 2016-10-04

    will it ever be supported? most of windows-created .zips goes with OEM encoded filenames.

     
  • unxed

    unxed - 2016-10-05

    btw, 7z.exe 16.03 processes that archive ok under wine, why p7zip can not behave the same way?

     
  • Igor Pavlov

    Igor Pavlov - 2016-10-05

    Note that there are different oem / win encoding in windows.
    And there are many such encodfing (different regions). Wine probably knows how to work with them.
    Probably p7zip developer doesn't think that this feature is too important, Or it can be difficult to implement.

     
  • unxed

    unxed - 2016-10-05

    As for WIN (ANSI) encoding, according to the sources of Far Manager for Linux, it is only used with few legacy Packer Versions (ZipHeader.PackVer>20 && ZipHeader.PackVer<25). See:
    https://github.com/elfmz/far2l/blob/master/multiarc/src/formats/zip/zip.cpp#L314

    All other windows packers use OEM as defined in specification:
    https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
    actually "IBM Code Page 437" is interpreted as "OEM code page currently selected" by most packers.

    Linux packers mostly use UTF8.

    So the best behavour AFAIK is to use OEM charset corresponding to currently selected locale (assuming that archive was created on system with the same locale selected) for all archives created on windows, and I see Far Manager for Linux and unzip tool doing it that way already.

     

    Last edit: unxed 2016-10-05
  • unxed

    unxed - 2016-10-05

    Command line switch for specifying charset used to decode OEM-encoded filenames if archive is created on windows would also be a good option.

     
  • Igor Pavlov

    Igor Pavlov - 2016-10-05

    7-Zip supports it:

    -mcp=1252
    

    but p7zip probably ignores it.

     
  • unxed

    unxed - 2019-07-01

    Uploading sample of problem as https://2g0.ru/case2.zip is not working any more.

     
  • unxed

    unxed - 2020-06-21

    Here is a table of OEM charsets with corresponding posix locales (based on Wine sources).
    https://github.com/unxed/oemcp/blob/master/oemcp.txt
    So we just need to get system locale, find corresponding OEM charset and use iconv to get correct local file names as windows does.

     
  • unxed

    unxed - 2020-06-22

    @ipavlov here is a patch for CPP/7zip/Archive/Zip/ZipItem.cpp that fixes that bug.

    System locale setting is used to select corresponding OEM code page to use. File/folder names coversion is done via iconv.

     

    Last edit: unxed 2020-07-16
  • unxed

    unxed - 2020-06-22

    OEM code page table is made from wine sources, script is here https://github.com/unxed/oemcp

     
  • Alkis Georgopoulos

    Thank you unxed, this is a very good solution as it allows uncompressing Windows .zip archives in Linux without having to specify any special parameters.

    And in the rare case where one needs to uncompress a .zip from a different locale, he could use e.g. LC_ALL=el_GR.UTF-8 7z x win10test.zip.

    This issue affects many many users, they just report the problem against font-ends like MATE's engrampa instead of here. Please apply the patch, thank you very much!

     
  • unxed

    unxed - 2020-06-23

    Updated patch with new version, two bugs was fixed:
    1) Forgot to zero-terminate converted utf-8 string
    2) Now using LC_CTYPE instead of LC_ALL to fix locale detection in some cases

     

    Last edit: unxed 2020-06-23
  • Alkis Georgopoulos

    I verify that the patch works fine in Ubuntu 20.04. I will upload patched p7zip debs in the Greek schools PPA in a couple of days.

    Thanks again unxed!

     
    👍
    1
  • Alkis Georgopoulos

    I uploaded the packages in my PPA:
    https://launchpad.net/~alkisg/+archive/ubuntu/ppa/+packages

    I'll wait for feedback from the p7zip developers, so that I then try to push the official fixes in Debian and Ubuntu...

     
  • unxed

    unxed - 2020-07-16

    Updated patch file with actual version. Now supports archives created on Mac OS X. Further development will be done here:
    https://github.com/unxed/oemcp/blob/master/p7zip_oemcp_ZipItem.cpp.patch

     
  • unxed

    unxed - 2023-08-15

    libnatspec can be used as alternative solution.

    https://github.com/Etersoft/libnatspec

     
    • Sam Tansy

      Sam Tansy - 2023-09-16

      Have you tried 7zip? It has Linux version now, v23 for that matter, so this may be better option than here. P7zip 17.xx is somewhat maintained here.

       
      • unxed

        unxed - 2024-05-18

        Still encoding problems in some cases:
        https://sourceforge.net/p/sevenzip/bugs/2473/

         

Log in to post a comment.

MongoDB Logo MongoDB