Menu

Japanese Folder and File Names not Rendered Properly in Linux with zip File Created in Windows

2021-09-05
2021-09-06
  • Xiaohong Yang

    Xiaohong Yang - 2021-09-05

    Hi All

    I created a zip file with the explorer shell extension of 7-Zip 19.00 in a Windows machine (Windows 10) with "Language for non-Unicode programs" set to Japanese (Control Panel -> Clock and Region -> Region -> Administrative -> Change system locale…). The zip file has Japanese folder and file names like
    ニュースおはよう日本.zip\ニュースおはよう日本\スペシャル\クローズアップ現代+<ライブ.docx

    I took this zip file to a Linux machine (Ubuntu 18.04) with Japanese locale settings (Settings -> Region & Language: set Language and Formats to Japanese). Following is the locale values

    ~$ locale
    LANG=ja_JP.UTF-8
    LANGUAGE=ja_JP:en
    LC_CTYPE="ja_JP.UTF-8"
    LC_NUMERIC=ja_JP.UTF-8
    LC_TIME=ja_JP.UTF-8
    LC_COLLATE="ja_JP.UTF-8"
    LC_MONETARY=ja_JP.UTF-8
    LC_MESSAGES="ja_JP.UTF-8"
    LC_PAPER=ja_JP.UTF-8
    LC_NAME=ja_JP.UTF-8
    LC_ADDRESS=ja_JP.UTF-8
    LC_TELEPHONE=ja_JP.UTF-8
    LC_MEASUREMENT=ja_JP.UTF-8
    LC_IDENTIFICATION=ja_JP.UTF-8
    LC_ALL=

    I used 7-Zip 21.03 to check and extract the content of the zip file and found out that the folder and file names are not rendered properly (see below).

    ~/winshare/Sample_Files/Archives/Zip/Unicode/Sample_Files$ 7zz l Created_with_7Zip_19.00.zip

    7-Zip (z) 21.03 beta (x64) : Copyright (c) 1999-2021 Igor Pavlov : 2021-07-20
    64-bit locale=en_US.UTF-8 Threads:3, ASM

    Scanning the drive for archives:
    1 file, 120026 bytes (118 KiB)

    Listing archive: Created_with_7Zip_19.00.zip

    --
    Path = Created_with_7Zip_19.00.zip
    Type = zip
    Physical Size = 120026

    Date Time Attr Size Compressed Name


    2021-09-04 09:03:09 D.... 0 0 �j���[�X���͂悤���{
    2021-09-04 09:03:39 D.... 0 0 �j���[�X���͂悤���{/�X�y�V����
    2021-08-26 10:42:47 ....A 94070 84991 �j���[�X���͂悤���{/�X�y�V����/�N���[�Y�A�b�v����{�����C�u.docx
    2021-08-26 10:43:32 ....A 13326 10612 �j���[�X���͂悤���{/�X�y�V����/�j���[�X���͂悤���{�����C�u.docx
    2021-08-26 10:45:05 ....A 14352 11553 �j���[�X���͂悤���{/�X�y�V����/�֗��ȏ�񂪂����ς�.docx
    2021-08-26 10:44:16 ....A 14422 11606 �j���[�X���͂悤���{/�X�y�V����/�ԑg�ύX.docx


    2021-09-04 09:03:39 136170 118762 4 files, 2 folders

    Note that when I created the zip file with the explorer shell extension of 7-Zip 21.03 in Windows the folder/file names are rendered properly in Linux with 7-Zip 19.00 and 21.03 (even in a machine with US locale). And the zip file can be extracted properly with folder and file names being correct. Checking the history, it seems that the problem was fixed in version 21.02 when creating the zip file (- 7-Zip now writes additional field for filename in UTF-8 encoding to zip archives. It allows to extract correct file name from zip archives on different systems.).

    But the zip files created with previous versions still have the problem and cannot be rendered properly with the latest version (21.03). Also the sample file (Created_with_7Zip_19.00.zip) got an error when extracted with 7-Zip 21.03 with US locale (see below). Wonder if you can address those issues.

    ~/winshare/Sample_Files/Archives/Zip/Unicode/Sample_Files$ locale
    LANG=en_US.UTF-8
    LANGUAGE=
    LC_CTYPE="en_US.UTF-8"
    LC_NUMERIC="en_US.UTF-8"
    LC_TIME="en_US.UTF-8"
    LC_COLLATE="en_US.UTF-8"
    LC_MONETARY="en_US.UTF-8"
    LC_MESSAGES="en_US.UTF-8"
    LC_PAPER="en_US.UTF-8"
    LC_NAME="en_US.UTF-8"
    LC_ADDRESS="en_US.UTF-8"
    LC_TELEPHONE="en_US.UTF-8"
    LC_MEASUREMENT="en_US.UTF-8"
    LC_IDENTIFICATION="en_US.UTF-8"
    LC_ALL=

    ~/winshare/Sample_Files/Archives/Zip/Unicode/Sample_Files$ 7zz x -o./7zz_out_19.00 ./Created_with_7Zip_19.00.zip

    7-Zip (z) 21.03 beta (x64) : Copyright (c) 1999-2021 Igor Pavlov : 2021-07-20
    64-bit locale=en_US.UTF-8 Threads:3, ASM

    Scanning the drive for archives:
    1 file, 120026 bytes (118 KiB)

    Extracting archive: ./Created_with_7Zip_19.00.zip

    Path = ./Created_with_7Zip_19.00.zip
    Type = zip
    Physical Size = 120026

    ERROR: Cannot open output file : errno=71 : Protocol error : ./7zz_out_19.00/�j���[�X���͂悤���{/�X�y�V����/�N���[�Y�A�b�v����{�����C�u.docx
    ERROR: Cannot open output file : errno=71 : Protocol error : ./7zz_out_19.00/�j���[�X���͂悤���{/�X�y�V����/�j���[�X���͂悤���{�����C�u.docx
    ERROR: Cannot open output file : errno=71 : Protocol error : ./7zz_out_19.00/�j���[�X���͂悤���{/�X�y�V����/�֗��ȏ�񂪂����ς�.docx
    ERROR: Cannot open output file : errno=71 : Protocol error : ./7zz_out_19.00/�j���[�X���͂悤���{/�X�y�V����/�ԑg�ύX.docx

    Sub items Errors: 4

    Archives with Errors: 1

    Sub items Errors: 4

    Attached are two zip file created in Windows: one created with the explorer shell extension of 7-Zip 19.00 and the other created with the explorer shell extension of 7-Zip 21.03.
    Operating systems are: Window 10 and Ubuntu 18.04. 7-Zip version 19.00 and 21.03 64-bit.

     
  • Igor Pavlov

    Igor Pavlov - 2021-09-05

    Yes, utf-8 in new 21.03 version resolves these problems.

     
  • Xiaohong Yang

    Xiaohong Yang - 2021-09-05

    Hi Igor,
    Can you tell me how to list and extract the zip files with Japanese folder and file names created in the past (like the sample file Created_with_7Zip_19.00.zip) in a Linux machine (like Ubuntu 18.04) correctly ? We have some old files that need to be extracted with 7-Zip in Linux machines.
    Thank you very much.

     
    • Igor Pavlov

      Igor Pavlov - 2021-09-06

      Maybe windows version of 7-zip via wine can do it.
      We need some code that knows windows charset encoding.

       

Log in to post a comment.