Thread: [Gaim-i18n] Re: Translatable strftime() format strings?

A universal instant messaging (IM) program

Brought to you by: amc_grim, bigbrownchunx, qulogic, rekkanoryo, rlaager

pidgin-i18n

[Gaim-i18n] Re: Translatable strftime() format strings?

From: Richard L. <rl...@wi...> - 2006-02-09 18:43:33

Sorry for the bad threading...

I marked "%x %X" and "%B %Y" as translatable "Just In Case" it was
necessary for some language to have those in a different order or with
other stuff around them... From the discussion here, it seems like that
was a good idea.

I did not mark %x, %X, or %c by themselves, as those should be handled
properly by the C library. If they're not, I'd say it should be fixed
there. However, I've created the various gaim_{date,time} functions with
the intention that signals could be added to them, similar to what we do
with the message/logging timestamps. Then, plugins could override the
handling of the date formatting. I'd like to see us honor the GNOME and
KDE settings when running under one of those environments, for example.

On a related note... What charset does gettext() return strings in? The
man page says :

RETURN VALUE
       If a translation was found in one of the specified catalogs, it
is converted to the locale=E2=80=99s codeset and returned.

However, we use _("Some string") in all sorts of places (e.g. GTK+
functions) that expect UTF-8.

I ask because we shouldn't be passing UTF-8 to strftime(). So, if
gettext() returns UTF-8 in non-UTF-8 locales, then I need to convert it
in gaim_utf8_strftime(). However, if gettext() does behave as the man
page suggests, doesn't that cause problems with GTK+?

Richard

Re: [Gaim-i18n] Re: Translatable strftime() format strings?

From: Ethan B. <ebl...@cs...> - 2006-02-09 20:08:13

Richard Laager spake unto us the following wisdom:
> I ask because we shouldn't be passing UTF-8 to strftime(). So, if
> gettext() returns UTF-8 in non-UTF-8 locales, then I need to convert it
> in gaim_utf8_strftime(). However, if gettext() does behave as the man
> page suggests, doesn't that cause problems with GTK+?

There is no reason not to pass UTF-8 to strftime.  In UTF-8, % and all
of the specifier letters have the same codepoint as ASCII.

Gettext returns locale-specific strings, which is why the character
set of the translation strings must be specified in the .pofile -- it
has to know.  I am not sure how Gtk+ handles this; perhaps it assumes
that the incoming strings are in the locale charset if they don't
validate as UTF-8?  I know that Gtk+ _does_ work in non-UTF-8 locales
with non-ASCII characters, so...

Ethan

--=20
The laws that forbid the carrying of arms are laws [that have no remedy
for evils].  They disarm only those who are neither inclined nor
determined to commit crimes.
		-- Cesare Beccaria, "On Crimes and Punishments", 1764

Re: [Gaim-i18n] Re: Translatable strftime() format strings?

From: Bjoern V. <bj...@cs...> - 2006-02-09 20:11:22

Attachments: strftime-utf8-test.c

Richard Laager <rl...@wi...> wrote:

> On a related note... What charset does gettext() return strings in? The
> man page says :
>
> RETURN VALUE
>       If a translation was found in one of the specified catalogs, it
> is converted to the locale=FF=FFs codeset and returned.

Yes, gettext always returns UTF-8 within Gaim. There is the following
line

 =09bind_textdomain_codeset(PACKAGE, "UTF-8");

in src/gtkmain.c. bind_textdomain_codeset() specifies the output codeset
for gettext.

> I ask because we shouldn't be passing UTF-8 to strftime(). So, if
> gettext() returns UTF-8 in non-UTF-8 locales, then I need to convert it
> in gaim_utf8_strftime(). However, if gettext() does behave as the man
> page suggests, doesn't that cause problems with GTK+?

Why not? If I understood Ambrose Li right, he needs some special UTF-8
chars in Chinese date/time strings.

I tested strftime() with UTF-8 chars in date/time format strings with
Linux (SuSE Linux 10.x) and Solaris 5.9. I believe, that other systems
also can pass UTF-8 into strftime(). I attached a small test file for
this.

I also do not see a solution in converting the UTF-8 date/time string
before passing into strftime(). Since the UTF-8 chars are needed (for
instance in Chinese), you had to convert the charset twice:
1) UTF-8 (gettext-output) to ASCII (strftime-input)
2) ASCII to UTF-8 (strftime-out) to GTK+

Greetings, Bj=F6rn

Re: [Gaim-i18n] Re: Translatable strftime() format strings?

From: Richard L. <rl...@wi...> - 2006-02-10 18:54:35

On Thu, 2006-02-09 at 21:09 +0100, Bjoern Voigt wrote:
> I also do not see a solution in converting the UTF-8 date/time string
> before passing into strftime(). Since the UTF-8 chars are needed (for
> instance in Chinese), you had to convert the charset twice:
> 1) UTF-8 (gettext-output) to ASCII (strftime-input)
> 2) ASCII to UTF-8 (strftime-out) to GTK+

Not ASCII, but locale format.

The formatters passed to strftime() expand to strings in locale format.
Thus, to use them in Gaim, we convert the output of strftime() to UTF-8.

Now, what happens if we pass in a UTF-8 string (including formatters) to
strftime()? We get output that consists of a combination of locale and
UTF-8 format.

Obviously, this works if the locale format is UTF-8. But, if the local
format is an 8 bit character set, it seems to me like things wouldn't
work.

I imagine the correct approach is this:

1. Pass the format string from Gaim to gettext().
2. Convert the output from gettext() from UTF-8 to locale format.
3. Pass the translated, locale-converted string to strftime().
4. Convert the output of strftime() back to UTF-8.

Thoughts?

Richard

Re: [Gaim-i18n] Re: Translatable strftime() format strings?

From: Ethan B. <ebl...@cs...> - 2006-02-10 19:37:52

Richard Laager spake unto us the following wisdom:
> The formatters passed to strftime() expand to strings in locale format.
> Thus, to use them in Gaim, we convert the output of strftime() to UTF-8.

Ahh, I didn't understand your question the first time around.  This is
kind of ugly.

> I imagine the correct approach is this:
>=20
> 1. Pass the format string from Gaim to gettext().
> 2. Convert the output from gettext() from UTF-8 to locale format.
> 3. Pass the translated, locale-converted string to strftime().
> 4. Convert the output of strftime() back to UTF-8.

Yeah, this might be the best way to do it ... but it's disgusting.
The Right Way to do it would be to set the locale to a UTF-8 locale
within Gaim, but unfortunately locale names are not at all portable.
I just poked through the locale functions, and didn't see a good way
to bind the current charset to UTF-8 regardless of locale.  Ugh.

Ethan

--=20
The laws that forbid the carrying of arms are laws [that have no remedy
for evils].  They disarm only those who are neither inclined nor
determined to commit crimes.
		-- Cesare Beccaria, "On Crimes and Punishments", 1764

Re: [Gaim-i18n] Re: Translatable strftime() format strings?

From: Richard L. <rl...@wi...> - 2006-02-10 20:41:58

On Fri, 2006-02-10 at 14:37 -0500, Ethan Blanton wrote:
> Yeah, this might be the best way to do it ... but it's disgusting.
> The Right Way to do it would be to set the locale to a UTF-8 locale
> within Gaim, but unfortunately locale names are not at all portable.
> I just poked through the locale functions, and didn't see a good way
> to bind the current charset to UTF-8 regardless of locale.  Ugh.

Indeed. This will get better as people switch to UTF-8 as their charset.
Then the conversion functions just become strdup()s. (And, as a side
note, I'd like to someday optimize those out using Stringrefs, but
that's a discussion for another day.)

Thanks for the sanity check on this. I'm still fairly new to
internationalization.

Richard

Re: [Gaim-i18n] Re: Translatable strftime() format strings?

From: Bjoern V. <bj...@cs...> - 2006-02-11 22:07:12

Attachments: i18n93.patch

Richard Laager <rl...@wi...> wrote:

> I imagine the correct approach is this:
>
> 1. Pass the format string from Gaim to gettext().
> 2. Convert the output from gettext() from UTF-8 to locale format.
> 3. Pass the translated, locale-converted string to strftime().
> 4. Convert the output of strftime() back to UTF-8.
>
> Thoughts?

I see one problem with this approach. The step 2 may not be possible
without loosing information, if the user sets an unsuitable locale
charset.

I mean, that a conversion from UTF-8 to locale charset and back to UTF-8
may change the message:

message !=3D conv(conv(message, UTF-8, localecharset), localecharset, UTF-8=
)

I do not know, if this is an issue in many languages and locale setups.

But already in German it can be an issue:

   In normal case the locale charset (LC_CTYPE) and the language
   selection (LC_MESSAGES or LANG) should have compatible values. For
   instance the locale German/Germany (de_DE) has the following
   compatible charsets in (SuSE) Linux: ISO-8859-1, ISO-8859-15 and
   UTF-8. But it's also possible to use "C" or "POSIX" as the charset
   (LC_CTYPE) selection for German. Then a word like Erd=F6l (lets say in
   UTF-8) is converted to Erd"ol (in ASCII/POSIX). Unfortunately the
   conversation from Erd"ol to another charset (for instance UTF-8) is
   always Erd"ol, but not Erd=F6l (original).

As a result the user may see the date/time strings with garbage in some
locale setups.

I see the following better solutions (which still have problems):

1) Set LC_CTYPE=3DUTF-8 within Gaim. The only problem here is to find the
    "UTF-8" name (LC_CTYPE value) for each system. On Linux it's "utf8",
    on Solaris and FreeBSD it's "UTF-8". I prefer this solution because
    of it's simplicity. But where do we found the necessare UTF-8 locale
    names? Ethan Blanton wrote about this problem.

2) Usage of date/time strings which only have ASCII-value placeholders
    (regardless of locale). This means, that we can not use names for
    days, months etc. (%A, %B, ...) but numbers (%d, %m, ...). This
    date/time strings for strftime() should be still marked as
    translatable for the reasons which were explained by Ambrose Li.

    Example:

    English (original):   _("%m/%d/%Y %H:%M") -> 02/11/2006 22:27
    German  (translated): "%d.%m.%Y %H:%M"    -> 11.02.2006 22:27

    With this solution we do not need a character conversation. The user
    can pass UTF-8 via gettext() into strftime(). strftime() fills the
    placeholders with ASCII characters (numbers) into the UTF-8 date/time
    string.

The attached patch changes the two problematic date/time strings
according to solution 2. Please test this patch and write your comments.

Regards, Bj=F6rn

Re: [Gaim-i18n] Re: Translatable strftime() format strings?

From: Richard L. <rl...@wi...> - 2006-02-13 02:47:38

On Sat, 2006-02-11 at 23:06 +0100, Bjoern Voigt wrote:
> I see one problem with this approach. The step 2 may not be possible
> without loosing information, if the user sets an unsuitable locale
> charset.

I'm aware this is a possibility.

> 1) Set LC_CTYPE=3DUTF-8 within Gaim. The only problem here is to find t=
he
>     "UTF-8" name (LC_CTYPE value) for each system. On Linux it's "utf8"=
,
>     on Solaris and FreeBSD it's "UTF-8". I prefer this solution because
>     of it's simplicity. But where do we found the necessare UTF-8 local=
e
>     names? Ethan Blanton wrote about this problem.

If you find a semi-reliable way to do this, I'm all for it. We can
always fall back to the behavior we have now if we can't find a UTF-8
locale. However, I'm too busy to try to figure this out. Patches are
always wonderful. ;)

> 2) Usage of date/time strings which only have ASCII-value placeholders
>     (regardless of locale). This means, that we can not use names for
>     days, months etc. (%A, %B, ...) but numbers (%d, %m, ...). This
>     date/time strings for strftime() should be still marked as
>     translatable for the reasons which were explained by Ambrose Li.
...
> The attached patch changes the two problematic date/time strings
> according to solution 2. Please test this patch and write your comments.

To be perfectly blunt, I'm not going to sacrifice nice dates for English
speakers for non-English speakers in non-UTF8 locales, who *might* have
a problem, depending on how various translations are done. This is ONLY
a problem if you stick words in the translation... For example, using
your example of "Erd=C3=B6l" (... note I have no idea what this word mean=
s in
English...), if you translate "%x %X" to "%X Erd=C3=B6l %x", then you wil=
l
have problems if the user's character LC_CTYPE is "C", for example.
However, if you only translate it to "%X, %x", then you're fine.

If you or another translator is worried about this problem for speakers
of their language, they are free to translate "%B %Y" to "%m/%Y", "%x %
X" to "%m/%d/%Y %I:%M:%S %p", etc. as your patch suggests.

Richard