Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

Manually selecting character encoding for zip

2008-06-16
2014-07-13
  • Sin Jeonghun
    Sin Jeonghun
    2008-06-16

    Many people on the Internet prefer .zip format. But .zip old standards doesn't allow Unicode file names. I read that new zip standards (2007) allows Unicode names but still none of the built-in zip manager in Windows, WinRAR, WinZip or 7Zip supports Unicode zip.

    So when I receive .zip files from Koreans or Japanese, this is pain in the ... lower back. All the file names that contain Korean or Japanese are broken. In fact, there is an unzip utility which lets me choose character encodings (among Korean, Japanese and Chinese only), but this utility is very inconvenient (I have to pass many steps for each unzip operation).

    If 7Zip can also let users choose the character encodings for .zip files, this would be great. Since many people around the world prefer .zip format, we are likely to receive zips that contain foreign characters.

     
    • Vacon
      Vacon
      2008-06-16

      Hello everyone,

      you had a look at the history.txt coming with 7-Zip's installation (alternative -> http://www.7-zip.org/history.txt ), did you...?

      <quote>
      4.58 beta      2008-05-05
      -------------------------
      - Some speed optimizations.
      - 7-Zip now can unpack .lzma archives.
      - Unicode (UTF-8) support for filenames in .ZIP archives. Now there are 3 modes:
          1) Default mode: 7-Zip uses UTF-8, if the local code page doesn't contain required symbols.
          2) -mcu switch:  7-Zip uses UTF-8, if there are non-ASCII symbols.
          3) -mcl switch:  7-Zip uses local code page.
      - Now it's possible to store file creation time in 7z and ZIP archives (-mtc switch).
      - 7-Zip now can unpack multivolume RAR archives created with
        "old style volume names" scheme and names *.001, *.002, ...
      - Now it's possible to use -mSW- and -mSW+ switches instead of -mSW=off and -mSW=on 
      - Some bugs were fixed.
      - New localizations: Punjabi (Indian), Pashto.
      </quote>
      IMHO this is what you asked for :-)

      Best regards!

       
    • Sin Jeonghun
      Sin Jeonghun
      2008-06-16

      Hello.

      Sorry. I'm using the 4.57 stable version. I haven't checked 4.58 beta yet. However it doesn't change anything whether 7Zip supports making Unicode zip or not, because when I'm making an archieve I'd use .7z or .rar format. As I said, the problem is when I RECEIVING .zip files from other people on the Internet. I can't force all the people in the world to use 7Zip 4.58.

      And, it doesn't look like that I can specify Korean or Japanese character encoding when I DECOMPRESSING (NOT COMPRESSING) with -mcl or -mcu switch. Again, I don't use .zip format when I make archieve files, in the first place.

       
    • Lisbon, 16 of June of 2008.

      Have a look at URL: "https://sourceforge.net/forum/message.php?msg_id=4981955".

      It would work while 7-Zip does not process .zip manually character encoding.

      It gives you the choice to extract any file name inside .zip (or other archive type), to another file name, inputted by you.

      (JMLM).

       
    • Not only for zip, but also for other formats supported by 7Zip it would be good to have support for names decoding. Say, tar files transferred from any *nix to win may occasionaly contain files with non-ASCII names, they will have unpredictable translation on win.

       
  • ZiNgA BuRgA
    ZiNgA BuRgA
    2013-01-26

    Bumping this old thread just to show my support.

    It would be great if there was an option in the GUI to select the assumed character encoding for archives which don't standardise the character encoding used.
    For example, a menu option which lets the user force 7Zip to interpret filenames in a particular character set; 7Zip then transparently converts these to the system character set so that they display and extract correctly.

    Having it in the command line executable would be handy too :P

     
    Last edit: ZiNgA BuRgA 2013-01-26
  • Yumeyao
    Yumeyao
    2013-01-26

    You should use applocale, or, more favorably, use a modified version papplocale which is bug free and doesn't bother you with a confirm dialog if run by command line. original page & descriptions.

    Good luck with this.

     
    Last edit: Yumeyao 2013-01-26
  • ZiNgA BuRgA
    ZiNgA BuRgA
    2013-01-27

    Thanks a lot for the fast reply and tip!
    Unfortunately, it doesn't seem to have any effect here. After running papplocale, it doesn't give a list of encodings, only a list of languages. I've tried running 7zFM.exe with various different languages, but filenames in a .zip file still look corrupt - the different languages I've selected does not seem to impact the corrupt display at all.

     
  • loliturma
    loliturma
    2014-07-13

    Unfortunately, running 7zip via applocale or its derivatives fail to correct some filename encoding problems within zip files. Some of them can only be corrected by changing your entire operating system's language for non-unicode programs to match the zip file's original language, rebooting and then extracting with 7zip (or another unarchiver). Very clumsy. This is a problem that has existed for a decade, and I haven't found a good solution.

    The option to specify character encoding when decompressing files would be an excellent addition to 7zip.