Menu

#13 Files with special characters get messed up

Other
open
5
2004-12-31
2004-12-31
No

When adding a file with special characters (the ones I
have problems with are ć, ř and ĺ (uppercase Ć, Ř and
Ĺ)) the characters gets changed into some other strange
characters.

The zip-file itself can be named with these characters
without any problems, but any files with those
characters in their filenames added to the zipfile are
affected.

When opening such a zipfile using WinZip it reports the
original filename correctly (when looking at the
zipfile properties), but the filename used is the
warped one.

Discussion

  • Roy W. Andersen

    Roy W. Andersen - 2004-12-31

    Logged In: YES
    user_id=1181036

    Oh, in case it's not clear, the problem is with the
    pclzip.lib.php class, I'm using the latest version (2.3).

     
  • Roy W. Andersen

    Roy W. Andersen - 2005-01-03

    Logged In: YES
    user_id=1181036

    Characters translated:

    turns into
    turns into
    turns into
    turns into
    turns into +
    turns into

    I don't know any others, but those are all I use in my language.

     
  • Roy W. Andersen

    Roy W. Andersen - 2005-01-04

    Logged In: YES
    user_id=1181036

    I've done some more research on this, and it seems like
    extracting the files using PclZip reverses the namechange
    (e.g. they're extracted with the filenames they originally
    had), while extracting them using either WinZip (Windows) or
    unzip (Linux) will extract them with the warped filenames.

    The same happens when extracting a zipfile created in
    Windows with the special characters within the filenames.
    PclZip changes them into something else. I've attached a
    HTML file that shows the conversions that happen both ways.
    Don't know if it helps much, but there's been no response so
    far so I figure I might as well post whatever I find out :)

     
  • Roy W. Andersen

    Roy W. Andersen - 2005-01-04

    Table (html) showing the converted characters

     
  • Roy W. Andersen

    Roy W. Andersen - 2005-01-07

    Logged In: YES
    user_id=1181036

    I've found the solution:

    Zip-utilities like WinZip, PKZip, zip, expect the filenames
    inside of an archive to be encoded with the CP437 (MS-DOS)
    character encoding, which is neither UTF-8 nor ISO-8859-1
    compatible.

    By adding callback-functions that converts the encoding
    using iconv() the files get added and extracted with the
    correct filenames.

    Allow me to suggest that this conversion is hardcoded in
    PclZip instead :)

    I was presented with the solution thanks to the helpfull
    people of comp.lang.php - here's the thread:
    http://groups-beta.google.com/group/comp.lang.php/browse_thread/thread/88ff970c97622e7d/1c58f9c542331766#1c58f9c542331766

     
  • Roy W. Andersen

    Roy W. Andersen - 2005-01-07

    Logged In: YES
    user_id=1181036

    Sorry, I jumped the gun a bit.

    Using CP437 won't work as well after all, but CP850 works,
    atleast for the characters I'm having problems with, and it
    should work for all western european languages :)

     
  • Vincent Blavet

    Vincent Blavet - 2005-01-07

    Logged In: YES
    user_id=313981

    Hello !

    Thanks a lot for all your troubleshooting work ... !
    I will look at special char pb as soon as I can. But have
    little time now (and a PC crash to recover ...). With all
    the info and links you gave, we should find the right solution.
    Thanks
    Vincent

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.