From: Ethan B. <ebl...@cs...> - 2002-09-23 21:15:52
|
Sergey V. Udaltsov spake unto us the following wisdom: > There is a little question about gaim architecture and non-ascii > encoding. Currently, gaim is rather weak in this area. All the data is > transferred "as is" (even if gaim is 8-bit clean). The only plugin which > recodes the data for i18n purposes is well known rusconv. But this great > plugin is good for the convertion between 2 russian encodings. And its > usage is limited by ICQ protocol and other pure 8-bit protocols. > Unicode-based protocols are not supported properly (for example, MSN). Gaim 0.59.3 should work more or less properly for the UTF-8 protocols (as you say, MSN) if you have your locale set correctly. I use it regularly to converse in Japanese, which is much more difficult to convert than, say, KOI8-R. As far as ICQ and AIM, we haven't quite made it that far, yet. :-) I haven't looked at rusconv, but I suspect it is either using an application-specific encoding (in which case we're basically lucky it works) or abusing the Unicode encoding as a pure 8-bit transport (in which case we're even more lucky it works). The right answer is to convert Oscar packets to UTF-16, and that is being worked on... > So my major question is about internal gaim structures. Does gaim use > unicode internally? If no - why? If yes - what is the way? Would it be > possible to create some generic encoding handling approach with the > following principles: The plan is to move CVS over to using UTF-8 exclusively internally. Basically, by the time a string leaves the protocol plugin, it should be converted to UTF-8. For the UTF-8 protocols (MSN, Jabber, we think Yahoo!), this is trivial. For Oscar it is less trivial due to some design flaws, but it is not horribly difficult. The "local client" encoding you mention is no longer necessary with Gtk2, or will not be once everything is ported over to pango-using widgets. The reason for this is that the "local encoding" as far as a Gtk2 appliction is concerned is always UTF-8. Logging presents a slight problem here, my gut instinct is that logging should honor LC_CTYPE ... however, since logging is currently done through gtkimhtml (which will be going all UTF-8) as best I can tell, this may require some work. > 3. The "internal protocol" encoding is set on per-protocol and/or > per-buddy basis (some per-protocol default would be nice here - and gaim > config could allow to modify this parameter). So for MSN it would be > unicode (UTF-8?), for ICQ it can be ISO8859-1 (or Windows CP1251 for > Russian users). The correct answer for ICQ is to just use the Oscar Unicode transport settings. Like I said, it's being worked on. :-) (Where "worked on" is a nebulous concept lately...) > I really admire gaim and would love to see it perfect. I think with the > library like iconv, my ideas are not very difficult to implement. I > would be happy to help here with some code but I am afraid I don't have > enough spare time:( I've committed to helping make gaim perfect in the i18n department. :-) Just as soon as I have some free time to kick around, hopefully we'll start seeing real improvements in that area. As Luke said, the move to Gtk2/Pango is certainly going to help. Thanks for your comments, and stick around for the changeover... I'll need plenty of bug-testers. :-) Ethan --=20 And if I claim to be a wise man / it surely means that I don't know. -- Kansas, "Carry on Wayward Son" |