Re: Re[2]: [cgiirc-general] cgi-irc 0.5.4 xml bug
Brought to you by:
dgl
|
From: David L. <dg...@dg...> - 2004-02-02 16:35:56
|
[sending to the mailing list too..] peter green wrote: [snip how utf8 works] > mirc treats all text recived as extended ascii > afaict x-chat trys to parse as utf-8 first and if it's not valid utf-8 > treats it as extended ascii Yes, this is all pretty simple, it becomes complex because we do not know what charset people will be using on IRC, at the moment it just "works" because people generally have their browser set to correctly read their local character set (and presumably their IRC client too). Now for most (US/Western Europe) people extended ascii is what they expect and this works ok with XMLHTTP (the first 0xFF chars of unicode are the same as iso-8859-1). But anything else is encoded as a unicode character with the format %uXXXX where XXXX is the hexadecimal representation of the unicode character much like other HTTP params are encoded. (btw, the thing I've been testing with on this system is the euro character). So basically all my patch does is change this %uXXXX into utf-8 and provided CGI:IRC is set to use utf-8 all should appear fine to the CGI:IRC user, however the problem now is CGI:IRC will be sending utf-8 over IRC and some people expect it to be something different (e.g. cyrillic users are probably using windows CP1251). To solve this there is going to have to be something to convert in CGI:IRC (and probably a config setting as to what to send to IRC - I think we should try to keep the HTTP side of things running in utf-8). Currently the Unicode::Map8 module looks like the best thing to use (and if it isn't installed we'll only work with iso-8859-1)... Anyone else have thoughts? |