From: Mikael Bourges-S. <mik...@gm...> - 2005-12-07 06:56:35
|
Dear All, In ZipFile.java and ZipInputStream.java, there is a bug when reading file names. On WinXP, one must use codepage Cp850 (PC Latin-1 codepage). So you should replace: String name = new String(buffer); by String name = new String(buffer,"Cp850"); However, I wonder if there is any indication in the zip spec about the codepage used to encode filename so it would be more generic than using Cp850. How does it sound? Kind regards, Mike |
From: Christian S. <chr...@sc...> - 2005-12-07 08:38:20
|
Hi Mr. Bourges-Sevenier, in fact, CP437 should be used. However, there's been a long discussion about this and the outcome is that jazzlib's objective is to provide as much compatibility to the J2SE API as possible. This means in this case that UTF-8 must be used because Sun has once decided to do so - this is a well known bug by the way. If you would like to try another library which addresses this (and many other) restrictions of the genuine java.util.zip package, then please have a look at my TrueZIP project at http://truezip.dev.java.net . Its low level API is backwards compatible to java.util.zip and provides many extensions like a selectable character encoding and the high level API makes working with ZIP files redundant by providing a drop-in replacement for the classes File, FileInputStream, FileOutputStream and others which treat ZIP files like directories in the pathname (a virtual ZIP file system). With best regards, Christian Schlichtherle --- Schlichtherle IT Services Wittelsbacherstr. 10a 10707 Berlin Tel: 030 / 34 35 29 29 Mobil: 0173 / 27 12 470 mailto:chr...@sc... http://www.schlichtherle.de <http://www.schlichtherle.de/> _____ From: jaz...@li... [mailto:jaz...@li...] On Behalf Of Mikael Bourges-Sevenier Sent: Wednesday, December 07, 2005 7:57 AM To: jaz...@li... Subject: [Jazzlib-developers] [bug] Jazzlib and codepages Dear All, In ZipFile.java and ZipInputStream.java, there is a bug when reading file names. On WinXP, one must use codepage Cp850 (PC Latin-1 codepage). So you should replace: String name = new String(buffer); by String name = new String(buffer,"Cp850"); However, I wonder if there is any indication in the zip spec about the codepage used to encode filename so it would be more generic than using Cp850. How does it sound? Kind regards, Mike |
From: Mikael Bourges-S. <mik...@gm...> - 2005-12-07 16:54:40
|
Dear Christian, in fact, CP437 should be used. [Mikael Bourges-Sevenier] Great, thanks! However, there's been a long discussion about this and the outcome is that jazzlib's objective is to provide as much compatibility to the J2SE API as possible. This means in this case that UTF-8 must be used because Sun has once decided to do so - this is a well known bug by the way. [Mikael Bourges-Sevenier] Yes before using this library, I made some research on the web and I was really surprised to see that this bug hasn't been resolved for so many years! Somehow, this doesn't surpise me too much as many codecs/parsers are not optimized or very limited in core java libraries. If you would like to try another library which addresses this (and many other) restrictions of the genuine java.util.zip package, then please have a look at my TrueZIP project at <http://truezip.dev.java.net> http://truezip.dev.java.net . Its low level API is backwards compatible to java.util.zip and provides many extensions like a selectable character encoding and the high level API makes working with ZIP files redundant by providing a drop-in replacement for the classes File, FileInputStream, FileOutputStream and others which treat ZIP files like directories in the pathname (a virtual ZIP file system). [Mikael Bourges-Sevenier] I was unaware of your library. This is exactly what I needed, thanks. I'm going to try it right now! In fact, I don't care much about compliance with java.util.zip package but I do care about compliance with zip format and other zip programs such as winzip. In my applications, I need to access files within an archive in a very fast manner. In general, the archive is not decompressed, only random items within. I wonder if NIO's file mapping could be used instead of streams and if it might be faster for such random accesses? Last, but not least, I'm looking for a RAR reader. Is there a TrueRar out there? Kind regards, Mike With best regards, Christian Schlichtherle --- Schlichtherle IT Services Wittelsbacherstr. 10a 10707 Berlin Tel: 030 / 34 35 29 29 Mobil: 0173 / 27 12 470 mailto:chr...@sc... http://www.schlichtherle.de <http://www.schlichtherle.de/> _____ From: jaz...@li... [mailto:jaz...@li...] On Behalf Of Mikael Bourges-Sevenier Sent: Wednesday, December 07, 2005 7:57 AM To: jaz...@li... Subject: [Jazzlib-developers] [bug] Jazzlib and codepages Dear All, In ZipFile.java and ZipInputStream.java, there is a bug when reading file names. On WinXP, one must use codepage Cp850 (PC Latin-1 codepage). So you should replace: String name = new String(buffer); by String name = new String(buffer,"Cp850"); However, I wonder if there is any indication in the zip spec about the codepage used to encode filename so it would be more generic than using Cp850. How does it sound? Kind regards, Mike |
From: Christian S. <chr...@sc...> - 2005-12-07 17:21:02
Attachments:
smime.p7s
|
Dear Mikael, thanks for the kind words. :-) I should add that TrueZIP uses CP437 for all files ending with a .ZIP suffix by default (this is configurable). However, for this to work, a user must have a *full* installation of the JRE. The standard installation unfortunately does not come with CP437, in which case TrueZIP automatically reverts to UTF-8, unfortunately without a trace yet. I will document or enhance this somewhat later (probably reverting to CP850 instead if this is a close match to CP437 and available in any standard JRE installation). On the random access: Like the java.util.zip.ZipFile class, TrueZIP does provide random access to individual entries of a ZIP file without decompressing the entire archive. However, due to the ZIP deflation algorithm, it is impossible to randomly access/seek the contents of a compressed ZIP file entry. Thus, contents must always be streamed for compression/decompression and hence TrueZIP provides drop-in replacements for FileInputStream and FileOutputStream, but not for RandomAccessFile. Again, I will add this to an FAQ on a later version. On RAR: Unfortunately no. I don't think its in widespread use anymore - please correct me from wrong. With best regards, Christian |