Menu

#48 p7zip doesn't extract cp932 (shift-jis) files properly

closed-wont-fix
nobody
None
5
2006-05-18
2006-05-18
utuhiro
No

Hi,

p7zip doesn't extract zip files that were made on
Japanese Windows properly.

How to reproduce
1. Set your locale to en_US.UTF-8.

2. Get this file and extract it.
$ wget
http://www.geocities.jp/ep3797/snapshot/tmp/test-japanese.tar.bz2

3. Install IPAMonaPGothic.ttf (a Japanese font).
$ cd test-japanese/
$ mkdir -p ~/.fonts
$ cp IPAMonaGothic.ttf ~/.fonts
$ cd ~/.fonts
$ fc-cache -f .
$ cd -

4. Extract ja-folder.zip with p7zip.
$ 7za x ja-folder.zip
=> You'll see a broken filename.
(Check the left image of result.png)

cf. Extract ja-folder.zip with 7z.exe (Windows version).
$ cd 7-Zip/
$ wine 7z.exe x ja-folder.zip
=> You'll see a proper filename.
(Check the right image of result.png)

Discussion

  • my space

    my space - 2006-05-18
    • status: open --> closed-wont-fix
     
  • my space

    my space - 2006-05-18

    Logged In: YES
    user_id=336051

    > p7zip doesn't extract zip files that were made on
    > Japanese Windows properly.

    Have you tried with "unzip" (the common Unix command for
    unzipping zip) ?

    If you try, you will have also broken filenames ...

    > ja-folder.zip

    I have tried to extract files from ja-folder.zip
    on Windows XP SP2 French
    with
    - built-in unzip feature of Windows
    - winrar
    - the GUI of 7-zip

    As expected, all these programs create broken filenames.

    The zip format can only store filenames as arrays of bytes.

    This array of bytes is encoded in the current Code-Page of
    your Windows
    (see
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_81rn.asp\).

    As the zip format does not store the code-page in the archive,
    none program can guess what code-page should be used ...

    So, if you want to exchange files with filenames that are
    not ASCII (english letters),
    you should really use another format.

    You can use "RAR" format or better the 7z format to store
    filenames
    in Unicode format (this encoding can store all the caracters
    of the world
    like japanese without the need of a code-page).

    > $ wine 7z.exe x ja-folder.zip
    > => You'll see a proper filename.

    I think that is Ok because you have configured wine to use the
    right code-page ...

    p7zip like all other unzip programs cannot help you ...

    So, please give up this very old format.

     

Log in to post a comment.

MongoDB Logo MongoDB