Re: Re[2]: [cgiirc-general] cgi-irc 0.5.4 xml bug

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

[sending to the mailing list too..]
peter green wrote:
[snip how utf8 works]
> mirc treats all text recived as extended ascii
> afaict x-chat trys to parse as utf-8 first and if it's not valid utf-8 
> treats it as extended ascii

Yes, this is all pretty simple, it becomes complex because we do not know
what charset people will be using on IRC, at the moment it just "works"
because people generally have their browser set to correctly read their
local character set (and presumably their IRC client too).

Now for most (US/Western Europe) people extended ascii is what they expect
and this works ok with XMLHTTP (the first 0xFF chars of unicode are the same
as iso-8859-1). But anything else is encoded as a unicode character with
the format %uXXXX where XXXX is the hexadecimal representation of the unicode
character much like other HTTP params are encoded. (btw, the thing I've been
testing with on this system is the euro character).

So basically all my patch does is change this %uXXXX into utf-8 and provided
CGI:IRC is set to use utf-8 all should appear fine to the CGI:IRC user,
however the problem now is CGI:IRC will be sending utf-8 over IRC and
some people expect it to be something different (e.g. cyrillic users
are probably using windows CP1251). To solve this there is going to have
to be something to convert in CGI:IRC (and probably a config setting as
to what to send to IRC - I think we should try to keep the HTTP side of
things running in utf-8).

Currently the Unicode::Map8 module looks like the best thing to use (and
if it isn't installed we'll only work with iso-8859-1)...

Anyone else have thoughts?