Menu

#1485 Wrong comments handling by 7Z(FM)

open
nobody
None
5
2015-02-20
2015-02-20
quanta
No

In the MAME Languages.ini 0.158[1], it contains archive comment not accessible within 7-Zip File Manager. This bug does not exist on command line version of 7-Zip. 7-Zip File Manager should include more extensive metadata handling capabilities.

When using command line version of 7-Zip, it incorrectly assumes comments are encoded in UTF-8, so the copyright character in the file archive comment is shown as a question mark ('?'). There is no -scc switch specified in command line, so it should have the same behaviour as using -sccDOS. Wrong -scc behaviour aside, there is a deeper issue with .ZIP archive comment handling. In .ZIP File Format Specification version 6.3.0 or later, section 'APPENDIX D - Language Encoding (EFS)' specifies the encoding behaviours of file name and comment for individual files, but not .ZIP file comment stored in end of central directory record, which is what happened to the archive[1]. The only clue ever comes close is at the beginning of the section:

The ZIP format has historically supported only the original IBM PC character
encoding set, commonly referred to as IBM Code Page 437. This limits storing
file name characters to only those within the original MS-DOS range of values
and does not properly support file names in other character encodings, or
languages.

However, this historic fact does nothing to address alternate encodings that have been used in .ZIP archives created by various archivers (including PKZIP) before the introduction of Appendix D, which has also been used in the attachment. In those cases, the unsupported behaviours are used, which mostly simply store the raw byte values of the file names in the host file system into the archive. Some archivers such as Info-ZIP include encoding information in extra fields, but that is an unofficial behaviour that can lead to header bloat over official solutions. As of .ZIP File Format Specification version 6.3.3, contents of 0x0008 Extra Field is not defined. It may have been used in current PKWARE products, but the behaviour can be subject to change without reason. All that aside, it still does not define or suggest the proper encoding of comment stored in end of central directory record. Therefore 7-Zip needs the ability to manually specify encodings of file names and comments. Although 7-Zip has -scc and -scs switches, they only affect console input/output and list files respectively, and the parameters for these switches are far too few to be practical for real world use. Furthermore, these command line switches are useless for navigation under 7-Zip File Manager GUI. 7ZFM cannot count on AppLocale to do the job properly, because the actual encoding of the files stored in an archive is not known in advance, and AppLocale can only do 1 encoding per application session. The -cl and -cu switches are never designed to handle comments, and for file names they are insufficient. Furthermore, -cl and -cu switches are only used when creating .ZIP archives, not extracting from .ZIP archives.

[1] https://web.archive.org/web/20150220050500/http://www.progettosnaps.net/languages/pS_Languages.zip

Discussion


Log in to post a comment.

MongoDB Logo MongoDB