From: Sergey V. U. <ser...@cl...> - 2002-09-23 20:54:28
|
Hi all There is a little question about gaim architecture and non-ascii encoding. Currently, gaim is rather weak in this area. All the data is transferred "as is" (even if gaim is 8-bit clean). The only plugin which recodes the data for i18n purposes is well known rusconv. But this great plugin is good for the convertion between 2 russian encodings. And its usage is limited by ICQ protocol and other pure 8-bit protocols. Unicode-based protocols are not supported properly (for example, MSN). So my major question is about internal gaim structures. Does gaim use unicode internally? If no - why? If yes - what is the way? Would it be possible to create some generic encoding handling approach with the following principles: 1. There is a "local client" encoding and "protocol" or "remote" encoding. All the visible character data is transcoded using them. 2. The gaim's local client encoding is determined at run time from LANG, LC_??? envvars and/or from gaim configuration 3. The "internal protocol" encoding is set on per-protocol and/or per-buddy basis (some per-protocol default would be nice here - and gaim config could allow to modify this parameter). So for MSN it would be unicode (UTF-8?), for ICQ it can be ISO8859-1 (or Windows CP1251 for Russian users). I really admire gaim and would love to see it perfect. I think with the library like iconv, my ideas are not very difficult to implement. I would be happy to help here with some code but I am afraid I don't have enough spare time:( Regards, Sergey |
From: Luke S. <lsc...@re...> - 2002-09-23 21:03:48
|
Gaim is currently undergoing a ton of work in the i18n area, moving from a long running attempt to translate things interanlly (happening in part because gtk1 doesn't offer very good support for i18n) to using pango and iconv to translate. different parts of this move have been accomplished with varying ammounts of success in the gtk1-stable tree (as reflected by the 0.59.x releases) and the current cvs head, which will become 0.60. we do not expect that 0.59.x will ever have great i18n, gtk1 is just too painful in this area, but 0.6x will eventually be significantly more capable. luke On Mon, Sep 23, 2002 at 09:44:01PM +0100, Sergey V. Udaltsov wrote: > Hi all > > There is a little question about gaim architecture and non-ascii > encoding. Currently, gaim is rather weak in this area. All the data is > transferred "as is" (even if gaim is 8-bit clean). The only plugin which > recodes the data for i18n purposes is well known rusconv. But this great > plugin is good for the convertion between 2 russian encodings. And its > usage is limited by ICQ protocol and other pure 8-bit protocols. > Unicode-based protocols are not supported properly (for example, MSN). > > So my major question is about internal gaim structures. Does gaim use > unicode internally? If no - why? If yes - what is the way? Would it be > possible to create some generic encoding handling approach with the > following principles: > 1. There is a "local client" encoding and "protocol" or "remote" > encoding. All the visible character data is transcoded using them. > 2. The gaim's local client encoding is determined at run time from LANG, > LC_??? envvars and/or from gaim configuration > 3. The "internal protocol" encoding is set on per-protocol and/or > per-buddy basis (some per-protocol default would be nice here - and gaim > config could allow to modify this parameter). So for MSN it would be > unicode (UTF-8?), for ICQ it can be ISO8859-1 (or Windows CP1251 for > Russian users). > > I really admire gaim and would love to see it perfect. I think with the > library like iconv, my ideas are not very difficult to implement. I > would be happy to help here with some code but I am afraid I don't have > enough spare time:( > > Regards, > > Sergey > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Gaim-devel mailing list > Gai...@li... > https://lists.sourceforge.net/lists/listinfo/gaim-devel -- -This email is made of 100% recycled electrons. -If something can go wrong.... FIX IT! If it's Microsoft...delete it. -There are three ways to get something done: (1) Do it yourself. (2) Hire someone to do it for you. (3) Forbid your kids to do it. |
From: Ethan B. <ebl...@cs...> - 2002-09-23 21:15:52
|
Sergey V. Udaltsov spake unto us the following wisdom: > There is a little question about gaim architecture and non-ascii > encoding. Currently, gaim is rather weak in this area. All the data is > transferred "as is" (even if gaim is 8-bit clean). The only plugin which > recodes the data for i18n purposes is well known rusconv. But this great > plugin is good for the convertion between 2 russian encodings. And its > usage is limited by ICQ protocol and other pure 8-bit protocols. > Unicode-based protocols are not supported properly (for example, MSN). Gaim 0.59.3 should work more or less properly for the UTF-8 protocols (as you say, MSN) if you have your locale set correctly. I use it regularly to converse in Japanese, which is much more difficult to convert than, say, KOI8-R. As far as ICQ and AIM, we haven't quite made it that far, yet. :-) I haven't looked at rusconv, but I suspect it is either using an application-specific encoding (in which case we're basically lucky it works) or abusing the Unicode encoding as a pure 8-bit transport (in which case we're even more lucky it works). The right answer is to convert Oscar packets to UTF-16, and that is being worked on... > So my major question is about internal gaim structures. Does gaim use > unicode internally? If no - why? If yes - what is the way? Would it be > possible to create some generic encoding handling approach with the > following principles: The plan is to move CVS over to using UTF-8 exclusively internally. Basically, by the time a string leaves the protocol plugin, it should be converted to UTF-8. For the UTF-8 protocols (MSN, Jabber, we think Yahoo!), this is trivial. For Oscar it is less trivial due to some design flaws, but it is not horribly difficult. The "local client" encoding you mention is no longer necessary with Gtk2, or will not be once everything is ported over to pango-using widgets. The reason for this is that the "local encoding" as far as a Gtk2 appliction is concerned is always UTF-8. Logging presents a slight problem here, my gut instinct is that logging should honor LC_CTYPE ... however, since logging is currently done through gtkimhtml (which will be going all UTF-8) as best I can tell, this may require some work. > 3. The "internal protocol" encoding is set on per-protocol and/or > per-buddy basis (some per-protocol default would be nice here - and gaim > config could allow to modify this parameter). So for MSN it would be > unicode (UTF-8?), for ICQ it can be ISO8859-1 (or Windows CP1251 for > Russian users). The correct answer for ICQ is to just use the Oscar Unicode transport settings. Like I said, it's being worked on. :-) (Where "worked on" is a nebulous concept lately...) > I really admire gaim and would love to see it perfect. I think with the > library like iconv, my ideas are not very difficult to implement. I > would be happy to help here with some code but I am afraid I don't have > enough spare time:( I've committed to helping make gaim perfect in the i18n department. :-) Just as soon as I have some free time to kick around, hopefully we'll start seeing real improvements in that area. As Luke said, the move to Gtk2/Pango is certainly going to help. Thanks for your comments, and stick around for the changeover... I'll need plenty of bug-testers. :-) Ethan --=20 And if I claim to be a wise man / it surely means that I don't know. -- Kansas, "Carry on Wayward Son" |
From: Sergey V. U. <ser...@cl...> - 2002-09-23 21:34:29
|
> Gaim 0.59.3 should work more or less properly for the UTF-8 protocols > (as you say, MSN) if you have your locale set correctly. I use it > regularly to converse in Japanese, which is much more difficult to > convert than, say, KOI8-R. :) OK. I'll try. The problem is whether 0.53 converts internal unicode of MSN correctly into the local encoding (for me it is koi8-r). It seems the answer is "no" but I'll check this once again... > we'll start seeing real improvements in that area. As Luke said, the > move to Gtk2/Pango is certainly going to help. Glad to hear. But at the moment Gtk1 is still alive (and GNOME 1 is more stable, I'd say). And it can do a lot of good things usual good old LANG, LC_, CHARSET variables. So do not drop it yet:) > Thanks for your comments, and stick around for the changeover... I'll > need plenty of bug-testers. :-) OK. Count me in. Sergey |
From: Luke S. <lsc...@gm...> - 2002-09-23 21:42:48
|
On Mon, Sep 23, 2002 at 10:30:50PM +0100, Sergey V. Udaltsov wrote: > > Gaim 0.59.3 should work more or less properly for the UTF-8 protocols > > (as you say, MSN) if you have your locale set correctly. I use it > > regularly to converse in Japanese, which is much more difficult to > > convert than, say, KOI8-R. > :) OK. I'll try. The problem is whether 0.53 converts internal unicode > of MSN correctly into the local encoding (for me it is koi8-r). It seems > the answer is "no" but I'll check this once again... uumm, note the version number difference there. we said 0.59.3 you said 0.53. you should not be using a gaim that old. luke -- -This email is made of 100% recycled electrons. -If something can go wrong.... FIX IT! If it's Microsoft...delete it. -There are three ways to get something done: (1) Do it yourself. (2) Hire someone to do it for you. (3) Forbid your kids to do it. |
From: Ethan B. <ebl...@cs...> - 2002-09-23 21:47:24
|
Sergey V. Udaltsov spake unto us the following wisdom: > > Gaim 0.59.3 should work more or less properly for the UTF-8 protocols > > (as you say, MSN) if you have your locale set correctly. I use it > > regularly to converse in Japanese, which is much more difficult to > > convert than, say, KOI8-R. > > :) OK. I'll try. The problem is whether 0.53 converts internal unicode > of MSN correctly into the local encoding (for me it is koi8-r). It seems > the answer is "no" but I'll check this once again... If your LANG and/or LC_CTYPE are set to ru_RU (or ru_RU.koi8-r or whatever), it should work fine. AIM/ICQ will not (yet). > > we'll start seeing real improvements in that area. As Luke said, the > > move to Gtk2/Pango is certainly going to help. > > Glad to hear. But at the moment Gtk1 is still alive (and GNOME 1 is more > stable, I'd say). And it can do a lot of good things usual good old > LANG, LC_, CHARSET variables. So do not drop it yet:) I agree ... unfortunately internationalization in Gtk1 (especially given the current state of the gaim codebase, which was not written with i18n in mind at *all*) is going to be difficult at best. The last few releases should have been markedly better, but issues still remain. > > Thanks for your comments, and stick around for the changeover... I'll > > need plenty of bug-testers. :-) > > OK. Count me in. Great! :-) Ethan --=20 And if I claim to be a wise man / it surely means that I don't know. -- Kansas, "Carry on Wayward Son" |
From: Sergey V. U. <ser...@cl...> - 2002-09-24 08:10:37
|
> If your LANG and/or LC_CTYPE are set to ru_RU (or ru_RU.koi8-r or > whatever), it should work fine. AIM/ICQ will not (yet). My LANG is ru_RU.KOI8-R. And it does not work. When I send russian to my partner - it sees it as a russian. But in wrong encoding. He uses standard MSN client, I use gaim. Should I submit it as a bug report or what? Can this be caused by useage of rusconv? > I agree ... unfortunately internationalization in Gtk1 (especially > given the current state of the gaim codebase, which was not written > with i18n in mind at *all*) is going to be difficult at best. The I do believe. But many programs still manage to be good in i18n department. So I hope gaim can do too. > last few releases should have been markedly better, but issues still > remain. I see myself:) Regards, Sergey |
From: Ethan B. <ebl...@cs...> - 2002-09-24 15:57:58
|
Sergey V. Udaltsov spake unto us the following wisdom: > My LANG is ru_RU.KOI8-R. And it does not work. When I send russian to my > partner - it sees it as a russian. But in wrong encoding. He uses > standard MSN client, I use gaim. Should I submit it as a bug report or > what? Can this be caused by useage of rusconv? Oh, yes! You should *not* be using rusconv with 0.59.3 over MSN... That would certainly cause it to be in the wrong encoding on the other end. Ethan --=20 And if I claim to be a wise man / it surely means that I don't know. -- Kansas, "Carry on Wayward Son" |
From: Sergey V. U. <ser...@cl...> - 2002-09-24 20:42:27
|
> Oh, yes! You should *not* be using rusconv with 0.59.3 over MSN... > That would certainly cause it to be in the wrong encoding on the other > end. OK. But if I want to use rusconv with AIM/ICQ account? I cannot say "use rusconv for ICQ, do not use for MSN", can I? So what would be solution for me (and other russians)? Cheers, Sergey |
From: Ethan B. <ebl...@cs...> - 2002-09-24 21:13:09
|
Sergey V. Udaltsov spake unto us the following wisdom: > > Oh, yes! You should *not* be using rusconv with 0.59.3 over MSN... > > That would certainly cause it to be in the wrong encoding on the other > > end. > > OK. But if I want to use rusconv with AIM/ICQ account? I cannot say "use > rusconv for ICQ, do not use for MSN", can I? So what would be solution > for me (and other russians)? The current solution (sorry) is wait for gaim to be fixed right. Hopefully that won't be *too* long now... Ethan --=20 And if I claim to be a wise man / it surely means that I don't know. -- Kansas, "Carry on Wayward Son" |