Files with special characters get messed up
Brought to you by:
vblavet
When adding a file with special characters (the ones I
have problems with are ć, ř and ĺ (uppercase Ć, Ř and
Ĺ)) the characters gets changed into some other strange
characters.
The zip-file itself can be named with these characters
without any problems, but any files with those
characters in their filenames added to the zipfile are
affected.
When opening such a zipfile using WinZip it reports the
original filename correctly (when looking at the
zipfile properties), but the filename used is the
warped one.
Logged In: YES
user_id=1181036
Oh, in case it's not clear, the problem is with the
pclzip.lib.php class, I'm using the latest version (2.3).
Logged In: YES
user_id=1181036
Characters translated:
turns into
turns into
turns into
turns into
turns into +
turns into
I don't know any others, but those are all I use in my language.
Logged In: YES
user_id=1181036
I've done some more research on this, and it seems like
extracting the files using PclZip reverses the namechange
(e.g. they're extracted with the filenames they originally
had), while extracting them using either WinZip (Windows) or
unzip (Linux) will extract them with the warped filenames.
The same happens when extracting a zipfile created in
Windows with the special characters within the filenames.
PclZip changes them into something else. I've attached a
HTML file that shows the conversions that happen both ways.
Don't know if it helps much, but there's been no response so
far so I figure I might as well post whatever I find out :)
Table (html) showing the converted characters
Logged In: YES
user_id=1181036
I've found the solution:
Zip-utilities like WinZip, PKZip, zip, expect the filenames
inside of an archive to be encoded with the CP437 (MS-DOS)
character encoding, which is neither UTF-8 nor ISO-8859-1
compatible.
By adding callback-functions that converts the encoding
using iconv() the files get added and extracted with the
correct filenames.
Allow me to suggest that this conversion is hardcoded in
PclZip instead :)
I was presented with the solution thanks to the helpfull
people of comp.lang.php - here's the thread:
http://groups-beta.google.com/group/comp.lang.php/browse_thread/thread/88ff970c97622e7d/1c58f9c542331766#1c58f9c542331766
Logged In: YES
user_id=1181036
Sorry, I jumped the gun a bit.
Using CP437 won't work as well after all, but CP850 works,
atleast for the characters I'm having problems with, and it
should work for all western european languages :)
Logged In: YES
user_id=313981
Hello !
Thanks a lot for all your troubleshooting work ... !
I will look at special char pb as soon as I can. But have
little time now (and a PC crash to recover ...). With all
the info and links you gave, we should find the right solution.
Thanks
Vincent