Filenames are disturbed while trying to list or decompress archive
A free file archiver for extremely high compression
Brought to you by:
ipavlov
File and folder names are disturbed while trying to list archive contents or decompress it.
~/test_folder$ 7zz l ./Desktop.zip
7-Zip (z) 21.07 (x64) : Copyright (c) 1999-2021 Igor Pavlov : 2021-12-26
64-bit locale=ru_RU.UTF-8 Threads:4
Scanning the drive for archives:
1 file, 330 bytes (1 KiB)
Listing archive: ./Desktop.zip
--
Path = ./Desktop.zip
Type = zip
Physical Size = 330
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2016-09-28 17:41:00 D.... 0 0
2016-09-28 17:40:41 ....A 4 4 ⥪⮢ 㬥.txt
------------------- ----- ------------ ------------ ------------------------
2016-09-28 17:41:00 4 4 1 files, 1 folders
Problematic archive sample
That zip file uses DOS OEM Russian (866) charset encoding.
Maybe your system is set for russian 866 OEM encoding support.
Try to extract that archive with both programs.
Also use latest 7-zip 24.05.
Actually we can expect in linux that utf-8 for file names are used.
and if there is no utf-8, it's not portable file name.
Last edit: Igor Pavlov 2024-05-18
The built-in .zip archiver in older versions of Windows used DOS (OEM) or Windows (ANSI) code page corresponding to current regional settings for new archives. Lots of such archives still exist.
The correct behavior is to determine the relevant OEM or ANSI code page based on the system locale and use it. You can look at this PR for reference implementation:
https://github.com/p7zip-project/p7zip/pull/232
Last edit: unxed 2024-05-22
unzip extracts correctly. 7zz extracts with incorrect names (which are not valid utf8 sequences).
"locale -> code page" translation table used in this PR is generated from Wine sources, dlls/kernel32/nls, using this tool.
Last edit: unxed 2024-05-22
Important addition: if PackOS field of zip header is 11 and PackVer field is 20 or greater, corresponding ANSI code page should be used instead of OEM code page.
Example of how to determine both OEM and ANSI code page for current locale can be found in far2l file manager sources, file WinPort/src/APIStringCodepages.cpp, function DeduceCodepages(). Conversion table itself is also generated from Wine sources.
See also: https://sourceforge.net/p/sevenzip/bugs/1060/#ed99/fa7d
This patch, applied against 24.05, solves the problem.
When that patch is applied against 7-Zip, will it cause the issue mentioned in https://github.com/p7zip-project/p7zip/issues/112 ?
Have no Arch to check
Yes, it will. Not only in Arch. Just tested it on mine.
So there is some problem with this patch, or with that archive (ComicInfo.zip (bugs.archlinux.org), found here) that reproduces it.
Is that the same what these p7zip commits are?: c104127e6a9364b8d6a1d79012e5249a129c3857
e56ea97d89eb0cd59603402496a8208238b3fda2
Last edit: Sam Tansy 2024-05-20
Partially
Made a new improved PR, code same as in ZipItem_v2.patch
https://github.com/p7zip-project/p7zip/pull/232
Please try this one
Same thing. During unpacking (`7z x ComicInfo.zip (bugs.archlinux.org)`) 7z creates a directory entry for what should be a file (here: `ComicInfo.xml'), then throws a warning prompt asking to replace said directory with a file.
As it is present in v17.03 and 24.xx it seems to be universal. You can test it yourself.
I don't know what triggers it but cannot test every version you send me here.
Maybe @ipavlov can assess this archive (`ComicInfo.zip') and find what is unusual in it, what would help to pinpoint the culprit.
Not reproduces for me. xml file is extracted just ok. 24.05 with patch above, Mint 21.3. I also have no resources to check in any environment possible.
Are you sure this issue is somehow related with my patch? https://github.com/p7zip-project/p7zip/issues/112 has several issues described, issue with my patch was 'sometimes 7z creates a file named "IBM437.so"', I am not sure issue with ComicInfo.zip is somehow related to it.
It does for me.
If you tell me some simple instructions I can try to debug, but as you probably know, debugging anything is tedious and time consuming process.
That's also a reason of my question to ipavlov about the archive and what is so unusual in it that can trigger such behaviour.
Fixed some errors. At least two persons confirmed this version is unzipping ComicInfo.zip as expected. Please retest if possible. Thanks!
Managed to reproduce bug on my Mint 21.3. Followed this instruction:
https://github.com/p7zip-project/p7zip/issues/112#issuecomment-850490605
Bug do not reproduces in far2l terminal, but reproduces in GNOME Terminal.
As for ComicInfo.zip, bug reproduces or not depending on current folder name, and only if binary was built on Alpine Linux (it was built by spvkgn in alpine:latest container, probably it's the current stable alpine version). For example, it reproduces if archive and archiver are inside
~/Проверкаor~/test testfolders.Not reproduces with ZipItem_v3.patch
Last edit: unxed 2024-05-21
Still the same.
Here they say, it's the problem with `CItem::GetUnicodeString' and offer some fix, which I didn't test yet.
Thank you! The fix looks reasonable. Applied. Check again please.
And it works.
Eventually someone found a root cause of it.
Thanks a lot for suggestion and testing!
Fixed some warnings as required by Debian
(outdated patch removed)
Last edit: unxed 2024-05-23