Re: I18N in OI2 (was: [Openinteract-help] project alive?)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

>   - We're using Locale::Maketext for everything
>   - Every package can define a set of message files (normally in msg/);

I used to work for a project that had 20 translations. We first used opaque 
IDs for identifying the strings. We soon noticed it was hard to translate, 
because once you translated the string from english to some other language, 
you don't see the original english string anywhere unless you put it in 
comments or look for it in the orginial english translation file. My question 
for opaque IDs is: how do you easily improve a translation if the keys are 
not based on the base language (comparing original to current, making 
modifications...)?

Another thing was the self-documenting aspects of having the base language key 
in the code. Consider for example:

$lh->maketext( 'Hello [_1], today is [_2]', $username, $date );

vs:

$lh->maketext( 'desktop.welcome', $username, $date );

If you are first time reading the code, it's not always obvious what that 
peace of code will actually produce unless you search the translation file at 
the same time.

Now another point: It is very common that some projects just never have 100% 
complete translations and it is common that translation files lacking latest 
improvements have some required keys missing. In that case I would suggest 
the logic:

1. look for requested key from the requested translation (e.g. spanish)
2. If not found, look for requested key from the default translation (english)
3. If not found, produce error

This way you can still use 100% translations that are made for some older 
versions, although some strings might appear in english.

If you consider using base language as keys, consider working around the 
global key problem (if I understood correctly, the keys have to be unique, 
even over package boundaries). For example, Maketext objects could be 
initialized for each package, so that the key name-space is unique for every 
package. Very useful, in my opinion.. you never know what strings or keys 
some other project will use.

> these use a slightly modified Java message bundle syntax (easy to
> write!)

In my opinion you should use gettext (PO files), because there are easy 
translation project management tools available for programs using gettext. 
I for example find Kbabel brilliant for that job: 
http://i18n.kde.org/tools/kbabel/

This tool for example hides the syntax details (editing hand results very 
often in syntax errors, even if the file is easy to edit), resulting in 
perfectly formatted files. This tool also tells you statistics of each 
translation: how many strings translated, how many % translated, how many 
fuzzy strings you have (.po format has this nice # ,fuzzy thing, which 
enables you to mark certain translation as "being under consideration".. 
imagine translating from english to chinese.. the translation is not always 
obvious). 

Also, easy navigation like "next untranslated string" helps a lot when you get 
into a mood of translating, when you don't have to look for untranslated or 
fuzzy translations by eye. Built-in spell checking (!) and dictionary-helper 
also helps maintaining basic quality of the translation. Syntax highlight 
etc..

Also, many translators are not techies. Even a simple file could be hard to 
modify by hand and remember all the syntax. Believe me, the small learning 
curve could affect their willing to try the translation process at all. 

There are also a plenty of graphical translation tools available for Windows 
and Unix that support the gettext system. 

> I'm in the middle of creating english localization files all the
> packages shipped with OI2. Talk about a boring job... :-)

Another point why to use base language strings as keys here: you could easily 
write a script that goes through your script, extracts strings from commands 
like $lh->maketext( 'Hello [_1], today is [_2]', $username, $weekday ) and 
generate a translation file for you. This way you could just improve the 
english translation in your code and then generate an up to date version of 
the translation file. 

In your FAQ you state:
"if you change the key in the base language even for punctuation you'll need 
to change all of them"

I solved this problem by writing a script that compared old and new strings 
and replaced them in the translation files. No more doing it by hand. I also 
wrote a script that compared each foreign translation to the base translation 
and added missing translation keys from base translation to foreign 
translation files and also produced a warning if foreign translation files 
contained keys not present in base translation (very common if you typo 
things or remove old keys form base translation).

Also, gnu gettext utils also contain some really useful tools for development 
like msgfmt. It is helpful for checking translation file syntax, produce some 
statistics etc. msgcmp compares two .po files finding differencies like 
missing msgid strings.

Hope this helps,

Teemu Arina

Ionstream Oy / Dicole
Komeetankuja 4 A
02210 Espoo
FINLAND
Tel: +358-(0)50 - 555 7636
http://www.mimerdesk.org
http://www.dicole.fi/en