|
From: Alexander M. <mo...@me...> - 2008-10-21 08:40:32
|
Only documentation I can find is Application Note on the .ZIP file format from PKWARE http://www.pkware.com/documents/casestudies/APPNOTE.TXT I realized that: 1. ZIP format officially supports only ISO8859–1 file names and not include any information about encoding at all. 2. In fact most of Windows archiver programs use OEM encoding. 3. But some one (e.g. IZarc, Info-Zip, Wiz) leaves file names without recoding. 4. Not sure, but seems like Unix/Linux archivers uses current locale encoding. 5. UTF-8 file name storage appears in version 6.3.2 of APPNOTE about 1 year ago. I can't succeed with UTF-8 archives on Windows yet. May be somebody knows what windows archivers currently support UTF-8? For 2. encoding must be OEM (e.g. CP866 for Russian). For 3. encoding must be $Conf{ClientCharset} (e.g. CP1251 for Russian). Quote from APPNOTE: The upper byte indicates the compatibility of the file attribute information. If the external file attributes are compatible with MS-DOS and can be read by PKZIP for DOS version 2.04g then this value will be zero. If these attributes are not compatible, then this value will identify the host system on which the attributes are compatible. Software can use this information to determine the line record format for text files etc. So, if we do "windows" encoding of file names, we also must set MS-DOS compatibility. I have done some experimenting. If I change in Central directory structure of zip archive (central file header signature 0x02014b50) the upper byte of "version made by" field from 3 (UNIX) to 0 (MS-DOS) and zip archive was created with OEM encoding (CP866 for Russian) both 2. and 3. types of archivers displays file names perfectly well. Craig, is it possible to set "version made by" field with BackupPC_zipCreate? Alexander Craig Barratt wrote: > I added the command-line argument for charset but didn't implement > a CGI setting. The problem I have is I can't find any documentation > for zip files and any standards around charset encoding for the > file names in a zip file. Is utf8 the correct default, or does > it depend on which platform is trying to unpack the zip file? > > Craig |