Cyrillic names inside zip
A free file archiver for extremely high compression
Brought to you by:
ipavlov
There are two zip-arhives, filenames in first one is displayed normally, second (newer one) have problems with cyrillic simbols.
The good one: https://github.com/Pr-Mex/vanessa-automation/releases/download/1.2.041.1/vanessa-automation.1.2.041.1.zip
The problematic one: https://github.com/Pr-Mex/vanessa-automation/releases/download/1.2.041.15/vanessa-automation.1.2.041.15.zip
WinRAR and TotalCommander's build-in program dispays filenames normally.
I tried on 3 different machines: win7 and two win10.
Tested on 7-Zip 23.01 x64, 7-Zip 24.04 x64, WinRAR 7.00 x64
Can't understand, it's 7-Zip fault or something wrong with PC (like missing fonts etc)?
WinRAR example was not attached.
Maybe the archive itself was assembled somehow incorrectly?
archive-1:
up
means UTF-8 paths.So 7-Zip uses UTF-8.
archive-2:
7-Zip expects that names are UTF-8, if
Host OS: Unix
, becauseUTF-8
is main encoding in linux now.If you unpack such archive in linux, it will use utf-8 encoding.
If you unpack such archive in windows with 7-zip, 7-zip tries to be compatible with linux, and 7-Zip also uses utf-8 encoding. But actually archive doesn't use utf-8, and it uses DOS encoding instead. So you see incorrect characters.
Another zip programs in Windows do not try to be so compatible with linux archives, and they can always use DOS encoding for such zip archives. That is why another zip programs can show good names in windows for that archive, but they will fail for some another zip archives created in linux.
Good solution is so:
ask creators of that archive to change software (or settings) that was used to create that zip file.
Last edit: Igor Pavlov 2024-04-24
Thanks!
same here.
in 19.00 version archive looks normal. 24.05 & 23.01 looks bad.
we can't "ask" creators change settings for zip file, it's kind of automatic system for this operations (government procurement website)
if it cannot be fix, we will have to roll back to the old version :(
Any website has administrators and developers. And they can try to fix the problem.
I suppose the problem will be fixed at that website over time. You can try to help them to fix the problem sooner.
You can use 7zip with this patch:
https://sourceforge.net/p/sevenzip/bugs/2473/?page=1#96ae
to extract such archive:
7zz -mcp=866 x ./vanessa-automation.1.2.041.15.zip
Last edit: unxed 2024-05-27
Upstream issue:
https://github.com/Pr-Mex/vanessa-automation/issues/2128
So, I've done some investigation, and here's what I found out. The tar.exe that comes with Git for Windows is bsdtar, which uses libarchive. And libarchive, when creating archives, always sets the value of "the operating system on which the archive was created" to UNIX, even if the library is built and running on Windows. Consequently, on Windows systems we end up with an archive where the encoding is 866 (standard for Windows console), but the operating system value is UNIX. Therefore, many archivers do not expect encoding 866 in this context.
I made a PR to libarchive to fix this issue:
https://github.com/libarchive/libarchive/pull/2240
See also:
https://github.com/Pr-Mex/vanessa-automation/issues/2128