From: Ethan B. <ebl...@cs...> - 2002-11-18 17:56:57
|
Robert Gomu=C5=82ka spake unto us the following wisdom: > When talking to people using ICQ200x. I cannot send and receive polish=20 > characters properly. This does not surprise me at all, I've had suspicions about ICQ and custom character sets for some time. > They appear on both sides as encoded in ISO8859-1. Yes, it appears that ICQ200x sets both AIM_IMFLAGS_ISO_8859_1 *and* AIM_IMFLAGS_CUSTOMCHARSET. I have no idea if that is "correct" or not, or if it *means* anything or not. > In fact - they are encoded as CP1250 (Windows-EE) - pseudo > Microsoft standard.=20 CP1250 is in fact a slight mangling of (the actual standard) ISO-8859-2 ... I had hoped this would provide us with some information about the encoding process, but came up with no relation to the=20 (3, 65536) tuple you're seeing. > As suggested, I applied a small patch to oscar.c: > if (args->icbmflags & AIM_IMFLAGS_CUSTOMCHARSET) { > debug_printf ("Custom character set: %d %d\n", args->char= set,=20 > args->charsubset); > + if (args->charset =3D=3D 3){ > + tmp =3D g_convert(args->msg, args->msglen, "UTF-8",= =20 > "CP1250", NULL, &convlen, &err); > + if (err) { > + debug_printf("CP1250 IM conversion: %s\n",=20 > err->message); > + tmp =3D strdup(_("(There was an error receiving = this=20 > message)")); > + } > + } > } >=20 > Why 3? Because it appeared when executed gaim -d. charsubset was 65536. > It did the thing. I receive messages with proper characters displayed. This is more or less what I would do, too. I would like to find more correlation between charset and charsubset numbers and certain encodings, but with the limited information we have now this is reasonable. > But ... > What with sending messages? > I see that they are sent always as UTF. Have no idea how to > g_convert messages _only_ sent to people using windows icq200x > client. I am afraid there is a need to follow whole conversation :( > I don't know a way to _guess_ client version or client encoding. I suspect that somewhere along the line we are informed that the peer wishes to use a non-UTF non-ISO-latin-1 non-ASCII encoding. If nothing else, the fact that the peer used a custom charset tells us something. Perhaps this should be used to set some flags/store some information in the connection structure. > Talking to people using gaim (oscar plugin) works perfectly (almost > perfectly - when I am offline and get message with polish > characters, after going online, I receive empty or partial message > - without polish chars). The only incoming messages I currently handle are standard IMs ... this will hopefully change in the near future, but I've been pressed for time lately. > Have you got any ideas? Maybe I should study other clients code > (licq, ickle) to find a solution? Or libicq2000? That may or may not be useful. It has been my experience that virtually *all* clients are busted and just send whatever charset they want to whomever they want at all times. (name that reference) It is *possible* that simply unsetting the AIM_IMFLAGS_CUSTOMCHARSET on sending a message would fix this problem ... We always set CUSTOMCHARSET for *every* outgoing packet, and I suspect this is wrong. I haven't had time to verify that yet, though, so I haven't changed anything. If it *is* wrong, though, the remote client may even be capable of a UCS2 conversion and not trying it. It's worth a shot to unset it and see if the sender-side problem just magically goes away. So that's may maybe-useful-maybe-not $0.02. ;-) Ethan --=20 And if I claim to be a wise man / it surely means that I don't know. -- Kansas, "Carry on Wayward Son" |