Menu

#2473 Filenames are disturbed while trying to list or decompress archive

open
nobody
None
5
2024-06-07
2024-05-18
unxed
No

File and folder names are disturbed while trying to list archive contents or decompress it.

~/test_folder$ 7zz l ./Desktop.zip

7-Zip (z) 21.07 (x64) : Copyright (c) 1999-2021 Igor Pavlov : 2021-12-26
 64-bit locale=ru_RU.UTF-8 Threads:4

Scanning the drive for archives:
1 file, 330 bytes (1 KiB)

Listing archive: ./Desktop.zip

--
Path = ./Desktop.zip
Type = zip
Physical Size = 330

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2016-09-28 17:41:00 D....            0            0   
2016-09-28 17:40:41 ....A            4            4   ⥪⮢ 㬥.txt
------------------- ----- ------------ ------------  ------------------------
2016-09-28 17:41:00                  4            4  1 files, 1 folders

Discussion

1 2 > >> (Page 1 of 2)
  • unxed

    unxed - 2024-05-18

    Problematic archive sample

     
  • unxed

    unxed - 2024-05-18
    ~/test_folder$ unzip -l ./Desktop.zip
    Archive:  ./Desktop.zip
      Length      Date    Time    Name
    ---------  ---------- -----   ----
            0  2016-09-28 18:41   Новая папка/
            4  2016-09-28 18:40   Новый текстовый документ.txt
    ---------                     -------
            4                     2 files
    
     
  • Igor Pavlov

    Igor Pavlov - 2024-05-18

    That zip file uses DOS OEM Russian (866) charset encoding.
    Maybe your system is set for russian 866 OEM encoding support.
    Try to extract that archive with both programs.
    Also use latest 7-zip 24.05.

    Actually we can expect in linux that utf-8 for file names are used.
    and if there is no utf-8, it's not portable file name.

     

    Last edit: Igor Pavlov 2024-05-18
  • unxed

    unxed - 2024-05-18

    The built-in .zip archiver in older versions of Windows used DOS (OEM) or Windows (ANSI) code page corresponding to current regional settings for new archives. Lots of such archives still exist.

    The correct behavior is to determine the relevant OEM or ANSI code page based on the system locale and use it. You can look at this PR for reference implementation:

    https://github.com/p7zip-project/p7zip/pull/232

     

    Last edit: unxed 2024-05-22
  • unxed

    unxed - 2024-05-18

    Try to extract that archive with both programs.

    unzip extracts correctly. 7zz extracts with incorrect names (which are not valid utf8 sequences).

     
  • unxed

    unxed - 2024-05-19

    https://github.com/p7zip-project/p7zip/pull/232

    "locale -> code page" translation table used in this PR is generated from Wine sources, dlls/kernel32/nls, using this tool.

     

    Last edit: unxed 2024-05-22
  • unxed

    unxed - 2024-05-20

    Important addition: if PackOS field of zip header is 11 and PackVer field is 20 or greater, corresponding ANSI code page should be used instead of OEM code page.

    Example of how to determine both OEM and ANSI code page for current locale can be found in far2l file manager sources, file WinPort/src/APIStringCodepages.cpp, function DeduceCodepages(). Conversion table itself is also generated from Wine sources.

    See also: https://sourceforge.net/p/sevenzip/bugs/1060/#ed99/fa7d

     
  • unxed

    unxed - 2024-05-20

    This patch, applied against 24.05, solves the problem.

     
    • Ninimu

      Ninimu - 2024-05-20

      When that patch is applied against 7-Zip, will it cause the issue mentioned in https://github.com/p7zip-project/p7zip/issues/112 ?

       
      • unxed

        unxed - 2024-05-20

        Have no Arch to check

         
      • Sam Tansy

        Sam Tansy - 2024-05-20

        When that patch is applied against 7-Zip, will it cause the issue mentioned in https://github.com/p7zip-project/p7zip/issues/112 ?

        Yes, it will. Not only in Arch. Just tested it on mine.

        7-Zip (z) 24.05 (x86) : Copyright (c) 1999-2024 Igor Pavlov : 2024-05-14
        
        Scanning the drive for archives:
        1 file, 750 bytes (1 KiB)
        
        Extracting archive: ComicInfo.zip
        --
        Path = ComicInfo.zip
        Type = zip
        Physical Size = 750
        
        
        Would you like to replace the existing file:
          Path:     ./ComicInfo.xml
          Size:     0 bytes
          Modified: 2024-05-21 00:19:46
        with the file from archive:
          Path:     ComicInfo.xml
          Size:     590 bytes (1 KiB)
          Modified: 2001-01-01 09:01:00
        ? (Y)es / (N)o / (A)lways / (S)kip all / A(u)to rename all / (Q)uit? u
        
        $ find .
        .
        ./ComicInfo.zip
        ./ComicInfo_1.xml
        ./ComicInfo.xml
        ./ComicInfo.xml/usr
        ./ComicInfo.xml/usr/lib
        ./ComicInfo.xml/usr/lib/gconv
        
        $  ls -lR
        .:
        total 3
        drwxr-xr-x 3 user users 1024 May 21 00:19 ComicInfo.xml
        -rw-r--r-- 1 user users 750 Jan  1  2001 ComicInfo.zip
        -rw-r--r-- 1 user users 590 Jan  1  2001 ComicInfo_1.xml
        
        ./ComicInfo.xml:
        total 1
        drwxr-xr-x 3 user users 1024 May 21 00:19 usr
        
        ./ComicInfo.xml/usr:
        total 1
        drwxr-xr-x 3 user users 1024 May 21 00:19 lib
        
        ./ComicInfo.xml/usr/lib:
        total 1
        drwxr-xr-x 2 user users 1024 May 21 00:19 gconv
        
        ./ComicInfo.xml/usr/lib/gconv:
        total 0
        

        So there is some problem with this patch, or with that archive (ComicInfo.zip (bugs.archlinux.org), found here) that reproduces it.

         
    • Sam Tansy

      Sam Tansy - 2024-05-20
       

      Last edit: Sam Tansy 2024-05-20
      • unxed

        unxed - 2024-05-20

        Is that the same what these p7zip commits are?

        Partially

         
      • unxed

        unxed - 2024-05-20

        Made a new improved PR, code same as in ZipItem_v2.patch
        https://github.com/p7zip-project/p7zip/pull/232

         
  • unxed

    unxed - 2024-05-20

    Please try this one

     
    • Sam Tansy

      Sam Tansy - 2024-05-21

      Same thing. During unpacking (`7z x ComicInfo.zip (bugs.archlinux.org)`) 7z creates a directory entry for what should be a file (here: `ComicInfo.xml'), then throws a warning prompt asking to replace said directory with a file.
      As it is present in v17.03 and 24.xx it seems to be universal. You can test it yourself.
      I don't know what triggers it but cannot test every version you send me here.
      Maybe @ipavlov can assess this archive (`ComicInfo.zip') and find what is unusual in it, what would help to pinpoint the culprit.

       
      • unxed

        unxed - 2024-05-21

        Not reproduces for me. xml file is extracted just ok. 24.05 with patch above, Mint 21.3. I also have no resources to check in any environment possible.

        Are you sure this issue is somehow related with my patch? https://github.com/p7zip-project/p7zip/issues/112 has several issues described, issue with my patch was 'sometimes 7z creates a file named "IBM437.so"', I am not sure issue with ComicInfo.zip is somehow related to it.

         
        • Sam Tansy

          Sam Tansy - 2024-05-21

          Not reproduces for me. xml

          It does for me.
          If you tell me some simple instructions I can try to debug, but as you probably know, debugging anything is tedious and time consuming process.

          That's also a reason of my question to ipavlov about the archive and what is so unusual in it that can trigger such behaviour.

           
          • unxed

            unxed - 2024-05-21

            Fixed some errors. At least two persons confirmed this version is unzipping ComicInfo.zip as expected. Please retest if possible. Thanks!

             
            • unxed

              unxed - 2024-05-21

              Managed to reproduce bug on my Mint 21.3. Followed this instruction:
              https://github.com/p7zip-project/p7zip/issues/112#issuecomment-850490605

              Bug do not reproduces in far2l terminal, but reproduces in GNOME Terminal.

              As for ComicInfo.zip, bug reproduces or not depending on current folder name, and only if binary was built on Alpine Linux (it was built by spvkgn in alpine:latest container, probably it's the current stable alpine version). For example, it reproduces if archive and archiver are inside ~/Проверка or ~/test test folders.

              Not reproduces with ZipItem_v3.patch

               

              Last edit: unxed 2024-05-21
            • Sam Tansy

              Sam Tansy - 2024-05-22

              Please retest if possible

              Still the same.

              Here they say, it's the problem with `CItem::GetUnicodeString' and offer some fix, which I didn't test yet.

               
              • unxed

                unxed - 2024-05-22

                Thank you! The fix looks reasonable. Applied. Check again please.

                 
                • Sam Tansy

                  Sam Tansy - 2024-05-22

                  The fix looks reasonable

                  And it works.
                  Eventually someone found a root cause of it.

                   
                  • unxed

                    unxed - 2024-05-22

                    Thanks a lot for suggestion and testing!

                     
                  • unxed

                    unxed - 2024-05-22

                    Fixed some warnings as required by Debian
                    (outdated patch removed)

                     

                    Last edit: unxed 2024-05-23
1 2 > >> (Page 1 of 2)

Log in to post a comment.

MongoDB Logo MongoDB