tar doesn't contain name encoding information.
7-Zip supports UTF-8 and OEM(DOS) encodings in tar.
If there is another encoding, 7-zip doesn't recognize it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
< 7-Zip supports UTF-8 and OEM(DOS) encodings in tar>
As I understand, in other archive formats 7-Zip supports not only UTF-8 and OEM(DOS) encodings. Am I right? So, may be it will be better to unify this feature towards TAR-files?
Last edit: givanis 2016-11-30
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is some native encoding in most formats.
When we have tar - I don't know any good way to detect that it's WIN encoding or OEM(DOS) encoding.
And why do you think that this archive is good example?
What program was used to create that tar archive?
Last edit: Igor Pavlov 2016-11-30
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So probably Total Commander and WinRAR developers think that Windows encoding is better way for tar format.
But 7-Zip uses DOS (OEM) encoding for TAR format.
Note 2 things:
1) It's bad if I change default encoding from DOS to Windows for TAR.
In that case, 7-Zip will not be able to work correctly with tar archives created with some previous versions of 7-zip.
2) I don't know any confirmation that WIndows encoding is more correct way than DOS (OEM) encoding for TARs.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So probably Total Commander and WinRAR developers think that Windows encoding is better way for tar format.>
Hmm. I was sure, that TAR is generally accepted archive format. And if your program is intended to work in Windows OS, its logical to use native encoding (not DOS!) by default.
I understand your confusion because of excess work, but its better to solve the problem, if it takes place. Total Commander succesfully detects TAR, created by 7-Zip and created directly by TC.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Игорь, а почему бы вместо замены наравне с UTF-8 и OEM не добавить ANSI, чтобы избежать проблемы с некорректной работой с архивами, созданных прежними версиеями?
Не вижу причин отдавать предпочтение DOS-кодировке по причинам исходности супротив универсальности.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1) Tar is not native format for Windows system, so we can't say that windows encoding is better than OEM encoding for TAR archives.
2) TAR is mostly used in posix / linux. So I try to implement some compatibility with linux. New linux systems use utf-8 in tars. So 7-zip now also uses utf-8.
When 7-Zip opens tar, it tries all names as utf-8.
And it's possible to detect in most cases that encoding is not utf-8m if it's not utf-8. In such cases 7-zip uses OEM encoding.
But it's more difficult to detect encoding from two similar encodings: OEM or Windows.
Now I don't remember why I've selected OEM (DOS) for TAR.
Maybe it was copy-paste from zip code. Note that OEM encoding is default for ZIP format also.
You can ask Total Commander and WinRAR developers about their reasons to use Windows encoding.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Игорь, я полагаю поддержка ANSI в указанных программах осуществлялась с той лишь целью, чтобы угодить пользователям Windows, для которых эти программы и создавались. Я полагаю, 7-zip прежде всего тоже создавалась для Windows. Я ещё мог бы понять, если бы 7-zip сама создавала TAR с UTF-8, но этого нет, в то время как возможность работать с TAR в популярных Windows-программах существует. Т.е. логично было бы иметь хотя бы что-то из двух вариантов - поддержка ANSI, создание TAR.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So we have 3 possible encodings for tar:
1) Good-utf-8 - it can be extracted in linux and in windows
2) bad-OEM - it can extracted by 7-zip
3) bad-WIN - it can extracted by WinRAR / Total Commander
So the best solution for everyone now is to use good utf-8 encoding fopr new tar archives, as 7-Zip.
You can ask developers of Total Commander and gpg4win to use utf-8 for created TARs.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
command line version supports it now.
For GUI:
1) archive creation - you can change it with parameters field, but it's not too useful - default utf-8 is OK for 99% cases.
2) For open archive in GUI - now there is no way to change encoding. Maybe later I'll think about it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
command line version supports it now.
Если речь про -sccWIN, то, например, с командой l (List) это к положительному результату в отношении TAR-архивов с кириллицей, созданных в ТС, не приводит.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
7-zip version 1506b
Игорь, обратите, пожалуйста, внимание на эту давнюю проблему для русскоязычных пользователей 7-zip
tar doesn't contain name encoding information.
7-Zip supports UTF-8 and OEM(DOS) encodings in tar.
If there is another encoding, 7-zip doesn't recognize it.
< 7-Zip supports UTF-8 and OEM(DOS) encodings in tar>
As I understand, in other archive formats 7-Zip supports not only UTF-8 and OEM(DOS) encodings. Am I right? So, may be it will be better to unify this feature towards TAR-files?
Last edit: givanis 2016-11-30
There is some native encoding in most formats.
When we have tar - I don't know any good way to detect that it's WIN encoding or OEM(DOS) encoding.
And why do you think that this archive is good example?
What program was used to create that tar archive?
Last edit: Igor Pavlov 2016-11-30
< What program was used to create that tar archive?>
In my case it was Total Commander. And as you see at the screenshot, placed by topic starter, WinRAR works with such files without any problem.
Last edit: givanis 2016-11-30
So probably Total Commander and WinRAR developers think that Windows encoding is better way for tar format.
But 7-Zip uses DOS (OEM) encoding for TAR format.
Note 2 things:
1) It's bad if I change default encoding from DOS to Windows for TAR.
In that case, 7-Zip will not be able to work correctly with tar archives created with some previous versions of 7-zip.
2) I don't know any confirmation that WIndows encoding is more correct way than DOS (OEM) encoding for TARs.
Hmm. I was sure, that TAR is generally accepted archive format. And if your program is intended to work in Windows OS, it
s logical to use native encoding (not DOS!) by default. I understand your confusion because of excess work, but it
s better to solve the problem, if it takes place. Total Commander succesfully detects TAR, created by 7-Zip and created directly by TC.Игорь, а почему бы вместо замены наравне с UTF-8 и OEM не добавить ANSI, чтобы избежать проблемы с некорректной работой с архивами, созданных прежними версиеями?
Не вижу причин отдавать предпочтение DOS-кодировке по причинам исходности супротив универсальности.
1) Tar is not native format for Windows system, so we can't say that windows encoding is better than OEM encoding for TAR archives.
2) TAR is mostly used in posix / linux. So I try to implement some compatibility with linux. New linux systems use utf-8 in tars. So 7-zip now also uses utf-8.
When 7-Zip opens tar, it tries all names as utf-8.
And it's possible to detect in most cases that encoding is not utf-8m if it's not utf-8. In such cases 7-zip uses OEM encoding.
But it's more difficult to detect encoding from two similar encodings: OEM or Windows.
Now I don't remember why I've selected OEM (DOS) for TAR.
Maybe it was copy-paste from zip code. Note that OEM encoding is default for ZIP format also.
You can ask Total Commander and WinRAR developers about their reasons to use Windows encoding.
Исходный архив в примере был получен пакетом gpg4win.
Игорь, я полагаю поддержка ANSI в указанных программах осуществлялась с той лишь целью, чтобы угодить пользователям Windows, для которых эти программы и создавались. Я полагаю, 7-zip прежде всего тоже создавалась для Windows. Я ещё мог бы понять, если бы 7-zip сама создавала TAR с UTF-8, но этого нет, в то время как возможность работать с TAR в популярных Windows-программах существует. Т.е. логично было бы иметь хотя бы что-то из двух вариантов - поддержка ANSI, создание TAR.
7-zip use utf-8 when it creates TAR archive.
So we have 3 possible encodings for tar:
1) Good-utf-8 - it can be extracted in linux and in windows
2) bad-OEM - it can extracted by 7-zip
3) bad-WIN - it can extracted by WinRAR / Total Commander
So the best solution for everyone now is to use good utf-8 encoding fopr new tar archives, as 7-Zip.
You can ask developers of Total Commander and gpg4win to use utf-8 for created TARs.
Нельзя ли сделать настройку по выбору TAR кодировки?
command line version supports it now.
For GUI:
1) archive creation - you can change it with parameters field, but it's not too useful - default utf-8 is OK for 99% cases.
2) For open archive in GUI - now there is no way to change encoding. Maybe later I'll think about it.
< Maybe later I'll think about it>
It would be great. Thank you, Igor, for your program and support.
or
Last edit: Igor Pavlov 2016-12-02