Menu

#1546 broken cyrillic filenames in TAR archives

open
nobody
TAR (2)
5
2016-12-02
2015-08-22
Artem
No

Cyrillic characters in Tar archives are displayed incorrectly

1 Attachments

Discussion

  • Artem

    Artem - 2015-08-22

    7-zip version 1506b

     
  • givanis

    givanis - 2016-11-30

    Игорь, обратите, пожалуйста, внимание на эту давнюю проблему для русскоязычных пользователей 7-zip

     
  • Igor Pavlov

    Igor Pavlov - 2016-11-30

    tar doesn't contain name encoding information.
    7-Zip supports UTF-8 and OEM(DOS) encodings in tar.
    If there is another encoding, 7-zip doesn't recognize it.

     
  • givanis

    givanis - 2016-11-30

    < 7-Zip supports UTF-8 and OEM(DOS) encodings in tar>

    As I understand, in other archive formats 7-Zip supports not only UTF-8 and OEM(DOS) encodings. Am I right? So, may be it will be better to unify this feature towards TAR-files?

     

    Last edit: givanis 2016-11-30
  • Igor Pavlov

    Igor Pavlov - 2016-11-30

    There is some native encoding in most formats.
    When we have tar - I don't know any good way to detect that it's WIN encoding or OEM(DOS) encoding.
    And why do you think that this archive is good example?
    What program was used to create that tar archive?

     

    Last edit: Igor Pavlov 2016-11-30
  • givanis

    givanis - 2016-11-30

    < What program was used to create that tar archive?>

    In my case it was Total Commander. And as you see at the screenshot, placed by topic starter, WinRAR works with such files without any problem.

     

    Last edit: givanis 2016-11-30
  • Igor Pavlov

    Igor Pavlov - 2016-11-30

    So probably Total Commander and WinRAR developers think that Windows encoding is better way for tar format.
    But 7-Zip uses DOS (OEM) encoding for TAR format.
    Note 2 things:
    1) It's bad if I change default encoding from DOS to Windows for TAR.
    In that case, 7-Zip will not be able to work correctly with tar archives created with some previous versions of 7-zip.
    2) I don't know any confirmation that WIndows encoding is more correct way than DOS (OEM) encoding for TARs.

     
  • givanis

    givanis - 2016-11-30

    So probably Total Commander and WinRAR developers think that Windows encoding is better way for tar format.>

    Hmm. I was sure, that TAR is generally accepted archive format. And if your program is intended to work in Windows OS, its logical to use native encoding (not DOS!) by default. I understand your confusion because of excess work, but its better to solve the problem, if it takes place. Total Commander succesfully detects TAR, created by 7-Zip and created directly by TC.

     
  • Flasher

    Flasher - 2016-11-30

    Игорь, а почему бы вместо замены наравне с UTF-8 и OEM не добавить ANSI, чтобы избежать проблемы с некорректной работой с архивами, созданных прежними версиеями?
    Не вижу причин отдавать предпочтение DOS-кодировке по причинам исходности супротив универсальности.

     
  • Igor Pavlov

    Igor Pavlov - 2016-11-30

    1) Tar is not native format for Windows system, so we can't say that windows encoding is better than OEM encoding for TAR archives.
    2) TAR is mostly used in posix / linux. So I try to implement some compatibility with linux. New linux systems use utf-8 in tars. So 7-zip now also uses utf-8.
    When 7-Zip opens tar, it tries all names as utf-8.
    And it's possible to detect in most cases that encoding is not utf-8m if it's not utf-8. In such cases 7-zip uses OEM encoding.
    But it's more difficult to detect encoding from two similar encodings: OEM or Windows.

    Now I don't remember why I've selected OEM (DOS) for TAR.
    Maybe it was copy-paste from zip code. Note that OEM encoding is default for ZIP format also.
    You can ask Total Commander and WinRAR developers about their reasons to use Windows encoding.

     
  • Artem

    Artem - 2016-11-30

    Исходный архив в примере был получен пакетом gpg4win.

     
  • Flasher

    Flasher - 2016-11-30

    Игорь, я полагаю поддержка ANSI в указанных программах осуществлялась с той лишь целью, чтобы угодить пользователям Windows, для которых эти программы и создавались. Я полагаю, 7-zip прежде всего тоже создавалась для Windows. Я ещё мог бы понять, если бы 7-zip сама создавала TAR с UTF-8, но этого нет, в то время как возможность работать с TAR в популярных Windows-программах существует. Т.е. логично было бы иметь хотя бы что-то из двух вариантов - поддержка ANSI, создание TAR.

     
  • Igor Pavlov

    Igor Pavlov - 2016-12-01

    7-zip use utf-8 when it creates TAR archive.

    So we have 3 possible encodings for tar:
    1) Good-utf-8 - it can be extracted in linux and in windows
    2) bad-OEM - it can extracted by 7-zip
    3) bad-WIN - it can extracted by WinRAR / Total Commander

    So the best solution for everyone now is to use good utf-8 encoding fopr new tar archives, as 7-Zip.
    You can ask developers of Total Commander and gpg4win to use utf-8 for created TARs.

     
  • Artem

    Artem - 2016-12-01

    Нельзя ли сделать настройку по выбору TAR кодировки?

     
    • Igor Pavlov

      Igor Pavlov - 2016-12-01

      command line version supports it now.
      For GUI:
      1) archive creation - you can change it with parameters field, but it's not too useful - default utf-8 is OK for 99% cases.
      2) For open archive in GUI - now there is no way to change encoding. Maybe later I'll think about it.

       
  • givanis

    givanis - 2016-12-01

    < Maybe later I'll think about it>

    It would be great. Thank you, Igor, for your program and support.

     
  • Flasher

    Flasher - 2016-12-01

    command line version supports it now.
    Если речь про -sccWIN, то, например, с командой l (List) это к положительному результату в отношении TAR-архивов с кириллицей, созданных в ТС, не приводит.

     
  • Igor Pavlov

    Igor Pavlov - 2016-12-02
    -mcp=1251
    

    or

    -mcp=0
    
     

    Last edit: Igor Pavlov 2016-12-02

Log in to post a comment.