From: Richard L. <rl...@wi...> - 2006-02-09 18:43:33
|
Sorry for the bad threading... I marked "%x %X" and "%B %Y" as translatable "Just In Case" it was necessary for some language to have those in a different order or with other stuff around them... From the discussion here, it seems like that was a good idea. I did not mark %x, %X, or %c by themselves, as those should be handled properly by the C library. If they're not, I'd say it should be fixed there. However, I've created the various gaim_{date,time} functions with the intention that signals could be added to them, similar to what we do with the message/logging timestamps. Then, plugins could override the handling of the date formatting. I'd like to see us honor the GNOME and KDE settings when running under one of those environments, for example. On a related note... What charset does gettext() return strings in? The man page says : RETURN VALUE If a translation was found in one of the specified catalogs, it is converted to the locale=E2=80=99s codeset and returned. However, we use _("Some string") in all sorts of places (e.g. GTK+ functions) that expect UTF-8. I ask because we shouldn't be passing UTF-8 to strftime(). So, if gettext() returns UTF-8 in non-UTF-8 locales, then I need to convert it in gaim_utf8_strftime(). However, if gettext() does behave as the man page suggests, doesn't that cause problems with GTK+? Richard |
From: Ethan B. <ebl...@cs...> - 2006-02-09 20:08:13
|
Richard Laager spake unto us the following wisdom: > I ask because we shouldn't be passing UTF-8 to strftime(). So, if > gettext() returns UTF-8 in non-UTF-8 locales, then I need to convert it > in gaim_utf8_strftime(). However, if gettext() does behave as the man > page suggests, doesn't that cause problems with GTK+? There is no reason not to pass UTF-8 to strftime. In UTF-8, % and all of the specifier letters have the same codepoint as ASCII. Gettext returns locale-specific strings, which is why the character set of the translation strings must be specified in the .pofile -- it has to know. I am not sure how Gtk+ handles this; perhaps it assumes that the incoming strings are in the locale charset if they don't validate as UTF-8? I know that Gtk+ _does_ work in non-UTF-8 locales with non-ASCII characters, so... Ethan --=20 The laws that forbid the carrying of arms are laws [that have no remedy for evils]. They disarm only those who are neither inclined nor determined to commit crimes. -- Cesare Beccaria, "On Crimes and Punishments", 1764 |
From: Bjoern V. <bj...@cs...> - 2006-02-09 20:11:22
Attachments:
strftime-utf8-test.c
|
Richard Laager <rl...@wi...> wrote: > On a related note... What charset does gettext() return strings in? The > man page says : > > RETURN VALUE > If a translation was found in one of the specified catalogs, it > is converted to the locale=FF=FFs codeset and returned. Yes, gettext always returns UTF-8 within Gaim. There is the following line =09bind_textdomain_codeset(PACKAGE, "UTF-8"); in src/gtkmain.c. bind_textdomain_codeset() specifies the output codeset for gettext. > I ask because we shouldn't be passing UTF-8 to strftime(). So, if > gettext() returns UTF-8 in non-UTF-8 locales, then I need to convert it > in gaim_utf8_strftime(). However, if gettext() does behave as the man > page suggests, doesn't that cause problems with GTK+? Why not? If I understood Ambrose Li right, he needs some special UTF-8 chars in Chinese date/time strings. I tested strftime() with UTF-8 chars in date/time format strings with Linux (SuSE Linux 10.x) and Solaris 5.9. I believe, that other systems also can pass UTF-8 into strftime(). I attached a small test file for this. I also do not see a solution in converting the UTF-8 date/time string before passing into strftime(). Since the UTF-8 chars are needed (for instance in Chinese), you had to convert the charset twice: 1) UTF-8 (gettext-output) to ASCII (strftime-input) 2) ASCII to UTF-8 (strftime-out) to GTK+ Greetings, Bj=F6rn |
From: Richard L. <rl...@wi...> - 2006-02-10 18:54:35
|
On Thu, 2006-02-09 at 21:09 +0100, Bjoern Voigt wrote: > I also do not see a solution in converting the UTF-8 date/time string > before passing into strftime(). Since the UTF-8 chars are needed (for > instance in Chinese), you had to convert the charset twice: > 1) UTF-8 (gettext-output) to ASCII (strftime-input) > 2) ASCII to UTF-8 (strftime-out) to GTK+ Not ASCII, but locale format. The formatters passed to strftime() expand to strings in locale format. Thus, to use them in Gaim, we convert the output of strftime() to UTF-8. Now, what happens if we pass in a UTF-8 string (including formatters) to strftime()? We get output that consists of a combination of locale and UTF-8 format. Obviously, this works if the locale format is UTF-8. But, if the local format is an 8 bit character set, it seems to me like things wouldn't work. I imagine the correct approach is this: 1. Pass the format string from Gaim to gettext(). 2. Convert the output from gettext() from UTF-8 to locale format. 3. Pass the translated, locale-converted string to strftime(). 4. Convert the output of strftime() back to UTF-8. Thoughts? Richard |
From: Ethan B. <ebl...@cs...> - 2006-02-10 19:37:52
|
Richard Laager spake unto us the following wisdom: > The formatters passed to strftime() expand to strings in locale format. > Thus, to use them in Gaim, we convert the output of strftime() to UTF-8. Ahh, I didn't understand your question the first time around. This is kind of ugly. > I imagine the correct approach is this: >=20 > 1. Pass the format string from Gaim to gettext(). > 2. Convert the output from gettext() from UTF-8 to locale format. > 3. Pass the translated, locale-converted string to strftime(). > 4. Convert the output of strftime() back to UTF-8. Yeah, this might be the best way to do it ... but it's disgusting. The Right Way to do it would be to set the locale to a UTF-8 locale within Gaim, but unfortunately locale names are not at all portable. I just poked through the locale functions, and didn't see a good way to bind the current charset to UTF-8 regardless of locale. Ugh. Ethan --=20 The laws that forbid the carrying of arms are laws [that have no remedy for evils]. They disarm only those who are neither inclined nor determined to commit crimes. -- Cesare Beccaria, "On Crimes and Punishments", 1764 |
From: Richard L. <rl...@wi...> - 2006-02-10 20:41:58
|
On Fri, 2006-02-10 at 14:37 -0500, Ethan Blanton wrote: > Yeah, this might be the best way to do it ... but it's disgusting. > The Right Way to do it would be to set the locale to a UTF-8 locale > within Gaim, but unfortunately locale names are not at all portable. > I just poked through the locale functions, and didn't see a good way > to bind the current charset to UTF-8 regardless of locale. Ugh. Indeed. This will get better as people switch to UTF-8 as their charset. Then the conversion functions just become strdup()s. (And, as a side note, I'd like to someday optimize those out using Stringrefs, but that's a discussion for another day.) Thanks for the sanity check on this. I'm still fairly new to internationalization. Richard |
From: Bjoern V. <bj...@cs...> - 2006-02-11 22:07:12
Attachments:
i18n93.patch
|
Richard Laager <rl...@wi...> wrote: > I imagine the correct approach is this: > > 1. Pass the format string from Gaim to gettext(). > 2. Convert the output from gettext() from UTF-8 to locale format. > 3. Pass the translated, locale-converted string to strftime(). > 4. Convert the output of strftime() back to UTF-8. > > Thoughts? I see one problem with this approach. The step 2 may not be possible without loosing information, if the user sets an unsuitable locale charset. I mean, that a conversion from UTF-8 to locale charset and back to UTF-8 may change the message: message !=3D conv(conv(message, UTF-8, localecharset), localecharset, UTF-8= ) I do not know, if this is an issue in many languages and locale setups. But already in German it can be an issue: In normal case the locale charset (LC_CTYPE) and the language selection (LC_MESSAGES or LANG) should have compatible values. For instance the locale German/Germany (de_DE) has the following compatible charsets in (SuSE) Linux: ISO-8859-1, ISO-8859-15 and UTF-8. But it's also possible to use "C" or "POSIX" as the charset (LC_CTYPE) selection for German. Then a word like Erd=F6l (lets say in UTF-8) is converted to Erd"ol (in ASCII/POSIX). Unfortunately the conversation from Erd"ol to another charset (for instance UTF-8) is always Erd"ol, but not Erd=F6l (original). As a result the user may see the date/time strings with garbage in some locale setups. I see the following better solutions (which still have problems): 1) Set LC_CTYPE=3DUTF-8 within Gaim. The only problem here is to find the "UTF-8" name (LC_CTYPE value) for each system. On Linux it's "utf8", on Solaris and FreeBSD it's "UTF-8". I prefer this solution because of it's simplicity. But where do we found the necessare UTF-8 locale names? Ethan Blanton wrote about this problem. 2) Usage of date/time strings which only have ASCII-value placeholders (regardless of locale). This means, that we can not use names for days, months etc. (%A, %B, ...) but numbers (%d, %m, ...). This date/time strings for strftime() should be still marked as translatable for the reasons which were explained by Ambrose Li. Example: English (original): _("%m/%d/%Y %H:%M") -> 02/11/2006 22:27 German (translated): "%d.%m.%Y %H:%M" -> 11.02.2006 22:27 With this solution we do not need a character conversation. The user can pass UTF-8 via gettext() into strftime(). strftime() fills the placeholders with ASCII characters (numbers) into the UTF-8 date/time string. The attached patch changes the two problematic date/time strings according to solution 2. Please test this patch and write your comments. Regards, Bj=F6rn |
From: Richard L. <rl...@wi...> - 2006-02-13 02:47:38
|
On Sat, 2006-02-11 at 23:06 +0100, Bjoern Voigt wrote: > I see one problem with this approach. The step 2 may not be possible > without loosing information, if the user sets an unsuitable locale > charset. I'm aware this is a possibility. > 1) Set LC_CTYPE=3DUTF-8 within Gaim. The only problem here is to find t= he > "UTF-8" name (LC_CTYPE value) for each system. On Linux it's "utf8"= , > on Solaris and FreeBSD it's "UTF-8". I prefer this solution because > of it's simplicity. But where do we found the necessare UTF-8 local= e > names? Ethan Blanton wrote about this problem. If you find a semi-reliable way to do this, I'm all for it. We can always fall back to the behavior we have now if we can't find a UTF-8 locale. However, I'm too busy to try to figure this out. Patches are always wonderful. ;) > 2) Usage of date/time strings which only have ASCII-value placeholders > (regardless of locale). This means, that we can not use names for > days, months etc. (%A, %B, ...) but numbers (%d, %m, ...). This > date/time strings for strftime() should be still marked as > translatable for the reasons which were explained by Ambrose Li. ... > The attached patch changes the two problematic date/time strings > according to solution 2. Please test this patch and write your comments. To be perfectly blunt, I'm not going to sacrifice nice dates for English speakers for non-English speakers in non-UTF8 locales, who *might* have a problem, depending on how various translations are done. This is ONLY a problem if you stick words in the translation... For example, using your example of "Erd=C3=B6l" (... note I have no idea what this word mean= s in English...), if you translate "%x %X" to "%X Erd=C3=B6l %x", then you wil= l have problems if the user's character LC_CTYPE is "C", for example. However, if you only translate it to "%X, %x", then you're fine. If you or another translator is worried about this problem for speakers of their language, they are free to translate "%B %Y" to "%m/%Y", "%x % X" to "%m/%d/%Y %I:%M:%S %p", etc. as your patch suggests. Richard |