Menu

Manually selecting character encoding for zip

YourGod
2008-06-16
2023-03-22
  • YourGod

    YourGod - 2008-06-16

    Many people on the Internet prefer .zip format. But .zip old standards doesn't allow Unicode file names. I read that new zip standards (2007) allows Unicode names but still none of the built-in zip manager in Windows, WinRAR, WinZip or 7Zip supports Unicode zip.

    So when I receive .zip files from Koreans or Japanese, this is pain in the ... lower back. All the file names that contain Korean or Japanese are broken. In fact, there is an unzip utility which lets me choose character encodings (among Korean, Japanese and Chinese only), but this utility is very inconvenient (I have to pass many steps for each unzip operation).

    If 7Zip can also let users choose the character encodings for .zip files, this would be great. Since many people around the world prefer .zip format, we are likely to receive zips that contain foreign characters.

     
    • Vacon

      Vacon - 2008-06-16

      Hello everyone,

      you had a look at the history.txt coming with 7-Zip's installation (alternative -> http://www.7-zip.org/history.txt ), did you...?

      <quote>
      4.58 beta      2008-05-05
      -------------------------
      - Some speed optimizations.
      - 7-Zip now can unpack .lzma archives.
      - Unicode (UTF-8) support for filenames in .ZIP archives. Now there are 3 modes:
          1) Default mode: 7-Zip uses UTF-8, if the local code page doesn't contain required symbols.
          2) -mcu switch:  7-Zip uses UTF-8, if there are non-ASCII symbols.
          3) -mcl switch:  7-Zip uses local code page.
      - Now it's possible to store file creation time in 7z and ZIP archives (-mtc switch).
      - 7-Zip now can unpack multivolume RAR archives created with
        "old style volume names" scheme and names *.001, *.002, ...
      - Now it's possible to use -mSW- and -mSW+ switches instead of -mSW=off and -mSW=on 
      - Some bugs were fixed.
      - New localizations: Punjabi (Indian), Pashto.
      </quote>
      IMHO this is what you asked for :-)

      Best regards!

       
    • YourGod

      YourGod - 2008-06-16

      Hello.

      Sorry. I'm using the 4.57 stable version. I haven't checked 4.58 beta yet. However it doesn't change anything whether 7Zip supports making Unicode zip or not, because when I'm making an archieve I'd use .7z or .rar format. As I said, the problem is when I RECEIVING .zip files from other people on the Internet. I can't force all the people in the world to use 7Zip 4.58.

      And, it doesn't look like that I can specify Korean or Japanese character encoding when I DECOMPRESSING (NOT COMPRESSING) with -mcl or -mcu switch. Again, I don't use .zip format when I make archieve files, in the first place.

       
    • João Miguel Lopes Moreira

      Lisbon, 16 of June of 2008.

      Have a look at URL: "https://sourceforge.net/forum/message.php?msg_id=4981955".

      It would work while 7-Zip does not process .zip manually character encoding.

      It gives you the choice to extract any file name inside .zip (or other archive type), to another file name, inputted by you.

      (JMLM).

       
    • Konstantin Pelepelin

      Not only for zip, but also for other formats supported by 7Zip it would be good to have support for names decoding. Say, tar files transferred from any *nix to win may occasionaly contain files with non-ASCII names, they will have unpredictable translation on win.

       
  • ZiNgA BuRgA

    ZiNgA BuRgA - 2013-01-26

    Bumping this old thread just to show my support.

    It would be great if there was an option in the GUI to select the assumed character encoding for archives which don't standardise the character encoding used.
    For example, a menu option which lets the user force 7Zip to interpret filenames in a particular character set; 7Zip then transparently converts these to the system character set so that they display and extract correctly.

    Having it in the command line executable would be handy too :P

     

    Last edit: ZiNgA BuRgA 2013-01-26
  • Yumeyao

    Yumeyao - 2013-01-26

    You should use applocale, or, more favorably, use a modified version papplocale which is bug free and doesn't bother you with a confirm dialog if run by command line. original page & descriptions.

    Good luck with this.

     

    Last edit: Yumeyao 2013-01-26
  • ZiNgA BuRgA

    ZiNgA BuRgA - 2013-01-27

    Thanks a lot for the fast reply and tip!
    Unfortunately, it doesn't seem to have any effect here. After running papplocale, it doesn't give a list of encodings, only a list of languages. I've tried running 7zFM.exe with various different languages, but filenames in a .zip file still look corrupt - the different languages I've selected does not seem to impact the corrupt display at all.

     
  • loliturma

    loliturma - 2014-07-13

    Unfortunately, running 7zip via applocale or its derivatives fail to correct some filename encoding problems within zip files. Some of them can only be corrected by changing your entire operating system's language for non-unicode programs to match the zip file's original language, rebooting and then extracting with 7zip (or another unarchiver). Very clumsy. This is a problem that has existed for a decade, and I haven't found a good solution.

    The option to specify character encoding when decompressing files would be an excellent addition to 7zip.

     
  • Vladimir Surguchev

    Up:
    Is it possible to introduce extract option to override current system OEM (and maybe ANSI) codepage?
    For example something like: -sco866 (-sca1251). At least in the API only...
    Then it will be possible to correctly open old archives with filenames coded in OEMCP that is not the same as current system defaults. Probably it will be useful not only for zip files...

     

    Last edit: Vladimir Surguchev 2016-01-13
  • Igor Pavlov

    Igor Pavlov - 2016-01-14

    You can set codepage for TAR and ZIP archives:

    -mcp=866
    
     

    Last edit: Igor Pavlov 2016-01-14
  • Vladimir Surguchev

    Is that option applicable to Extract operation?
    If so it is not evident in 7-Zip.chm ('-m' --- set compression method; missed in the command line Extract switches list).

     

    Last edit: Vladimir Surguchev 2016-01-14
  • Igor Pavlov

    Igor Pavlov - 2016-01-14

    1) Yes, it must work for extracting.
    2) I'll fix the help file.

     
  • Vladimir Surguchev

    It works for zip/tar extraction although filenames in progress callback look unreadable.
    Is it possible to expand this option to other formats with oem based names?
    I've found some old arj archives with russian names for example.

     

    Last edit: Vladimir Surguchev 2016-01-22
  • Igor Pavlov

    Igor Pavlov - 2016-01-22

    7-Zip uses OEM encoding for console output.
    So console strings will be OK, if OEM encoding supports required characters.

     
  • endolith

    endolith - 2023-03-22

    Is there a way to do this yet? ZIP files sent from Chinese factory always have garbled filenames.

     
    • Igor Pavlov

      Igor Pavlov - 2023-03-22

      Why?
      what program they use to create such zip?
      show some example.

       
      • endolith

        endolith - 2023-03-22

        Why?

        Because the file names are encoded in Chinese and I want them to display as Chinese characters so I can translate them into English?

        what program they use to create such zip?

        I don't know, and have no control over what they use. Some zip program on a computer with system encoding set to Chinese.

        show some example.

        Filename is Product A和B版本.jpg in folder Product 測試報告, but is encoded using Big5, so 7-zip extracts it as filename Product A⌐MB¬⌐Ñ╗.jpg in folder Product ┤·╕╒│°ºi.

        If I run C:\Program Files\7-Zip\7z.exe" -mcp=950 x "Product 測試報告.zip it extracts with the correct filenames, which I can then translate to English and understand the factory's intent.

        Or as an older example, I have filenames

        • ╢∞╜ª½÷┴Σ.STEP
        • ╢∞╜ªñW╗.STEP
        • ╢∞╜ª⌐│«y.STEP
        • ╣qª└╗.STEP

        After some effort, I figured out that these filenames meant "button", "front panel", "rear panel", "battery door".

         

        Last edit: endolith 2023-03-22

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.