Marc Laporte =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0:
> do not use A-Z letters ex: Chinese/Japanese. Are there any technical
> challenges remaining? [...] What should we do to have Tiki available
> in more languages and for our translations to be more complete? I
> would like to hear from people who have translated or want to
> translate, etc Below are some open questions.
Greetings Marc, and others:
I got this letter from Marc Laporte who asked for a reply on translating
Tiki. So here I try to cook up an answer you might find useful. It got
long, so please bear with it.
As I've not written to tikiwiki-devel before, it seems in order to first
try to get some credibility. Up to now I have been involved into several
translation projects, all l10n to Serbian language. A short portfolio of
the translations you can find here:
http://cobalt.et.tudelft.nl/~filip/translations.html . I speak English,
Russian, Dutch, Serbian and understand several related languages --
Macedonian, Croatian and Bosnian. My background is engineering, so you
will not go to me for advices regarding writing style, but I consider to
understand the technical issues regarding i18n and l10n well.
FYI, Serbian is a slavic language (thus related to Russian, for
instance, and not to German and not to French) and is spoken mainly in
the Balkan peninsula, Europe, in several countries and several different
Now on to the main topic.
- i18n (internationalization): adapting a program so that it is
ready to support multiple languages.
- l10n (localization): using the features provided by i18n to adapt
the program to a single language, e.g. Serbian.
The i18n is the task of the program developers, as the i18n support is
usually embedded in the source code. The l10n is the task of the
translators. (the actual division of work is not that clean-cut but
let's disregard that for simplicity; and even in reality it's not all
There are many technical challenges in i18n and l10n. This is often
misunderstood and understated. I suspect this is because you can only
fully understand these problems if you speak or know about many languages=
Current TikiWiki approach to translation is too simplistic to be useful
for the language community as large as Marc would like to have. Tiki
translation mechanism amounts to changing text labels in the source
language (call them msgid's) to labels in target language (call them
This mechanism may be enough for some germanic and romanic languages but
does not extend to other languages in general.
To illustrate I give you an example of the Serbian language. I chose it
first because I know it fairly well, and second because it has some of
the less common quirks that are crucial for correct l10n.
The Serbian language has:
- Two different redactions (language 'types') Ekavian, and Jekavian.
- Two different scripts: Serbian Cyrillic and Serbian Latin.
o Serbian Cyrillic uses several letters that are unique for this
language. It is contained in the ISO-8859-5 code page.
o Serbian Latin is the subset of the Latin 2 code page.
- Three different plural types. Which type of the plural is used depe=
- Seven different cases. The cases
- Context-sensitive word ordering.
All these issues are essential to good l10n and unfortunately they
bubble up also to the i18n level. This means that, in order to have good
l10n, the developers must set the i18n up properly. Also, the developers
must know a lot about i18n.
Some issues are easy to solve. For instance, the dual script usage is
solved easily by resorting to an Unicode encoding. People mostly use
UTF-8. To my finding this works well, and any other codepage juggling is
simply not worth the hassle and produces problems in the long run.
Some issues are solved only in some i18n libraries. For instance, the
multiple plural types are handled only by GNU ngettext, as far as I
know. (note the 'N').
Some issues are not solved by i18n libraries, such as the cases. Work is
in progress to handle them too. The solution will likely include
scripting on the l10n level. I will not go into details on this, as the
entire solution will probably completely fit into l10n, so developers
will never have to care about it.
The story doesn't end here. I gave an example of a quirky language. But
what about the wealth of other languages that may bring in their own
problems? I cannot say which problems they would introduce, and chances
are you cannot either.
So hopefully I made a case: i18n is a complicated matter. To do it
right, significant knowledge is needed and it cannot be simply hacked
at. This is because it involves knowing really lots of languages,
something even the brainiest hackers cannot achieve in due time.
You can always choose to ignore the more exotic languages. If you do so,
you would not be alone in fact. Many i18n libraries simply pretend that
everything's fine and provide only the basic i18n. This solution is fine
as long as you only really want to support a handful of languages. For
the order of 100 languages, it must be abandoned.
On the other hand, if you choose to have as good a support as possible,
you are facing a demanding task that is not easily engineered (the
language knowledge is fragmented so Joe Hacker cannot simply sit at his
machine on a friday evening and have the complete i18n ready by next
monday), is not particularly rewarding to do (so Joe will find little
interest ti pursue the matter at all) and will keep you away a long time
from what you really want: to make Tiki better (what Joe really wants is
a great TikiWiki). It is a big job with low payoff that's probably not
worth handling from scratch. Rather than thinking of how to make your
own tools, consider using tools that are already available.
An easy way out is to use an already available library that:
- Supports many languages well
- Offers tools for both i18n and l10n
- Is standard, so translators know how to go about it
- Makes it easy to migrate the existing translations to new ones.
- Allows marking the translations up to provide context, guidelines
for translations etc.
From my perspective (as a Serbian translator), the only library that's
even worth mentioning with respect to the above requirements is the GNU
ngettext. It has been around for a while, has a great tool set, the i18n
model is sufficient for a large subset of languages, exists for almost
any conceivable mainstream language. And finally, the ngettext people
will continue working on it, so it's money for nothing for the Tiki
team. Whenever gettext improves, Tiki i18n improves at no extra cost.
> 1. What motivates you to translate Tiki?
Naturally, the need to have a Tiki that will support content in a
different language. For a Serbian website I prefer that the controls be
in Serbian too.
> 2. What demotivates you to translate Tiki?
- Non-standard translation model.
You need to hunt down strings in a PHP source file, and take care
that the syntax is right etc.
- Insufficient translation model.
Simple string replacement does not offer all the l10n needs for the
language I am interested in.
- No work distribution.
I cannot easily work together with someone on a tiki translation,
because it is a single file. If I split the language file so that I can
share with someone, I have to merge manually and that sucks and is error
- Insufficient tool support.
It is difficult to make out which strings have changed, where and to
It is difficult to track string changes and incrementally add strings=
It is difficult to re-use translations from other projects.
It is difficult to put and maintain the information about the
context, i.e. the translator notes that are very important
- Insufficient i18n.
For correct i18n, there are guidelines one must adhere to. See for
> 3. Is it easy to contribute a translation? If so why? If not, why not?
Unfortunately, no. The reasons are given above.
> Should we setup a site focused on internationalization? A community
> area for translators? Ex.: i18n.tikiwiki.org
> Is committing your translations via CVS a problem? Should we try to
> setup a web interface? Or something like Rosetta?
Most important is to realize that i18n is a big nut to crack. And
hopefully understand that it is smart to use the already available tools
instead of rolling your own. More important than having a dedicated
i18n.tw.o is to do the following:
- use a library for ngettext handling, for instance:
- use my tiki2po converter to migrate existing translations into PO
can use tiki2po in fact to convert language.php to language.po and back.
Doing this for all the language.php files you can get the POs that make
the translation easier. But the i18n will remain at the language.php
level, and that's not sufficient, as I've argued above.
- adjust tiki source to use ngettext instead of the current system. In
the first instance, applying a sed script to the source tree will
suffice, with some tinkering here and there. (subsequent minor
improvements of i18n are to a large extent handled automatically by the
gettext tools, with almost no intervention from the l10n teams). Perhaps
to learn some more about the whole process. Not sure how Tiki templates
would be handled. However, I do know that there are programs that have
the entire system set up for PO. Plone (http://www.plone.org) has a
utility that extracts strings from templates as well as source code,
presenting the translators with PO files only.
- let people use Rosetta (http://rosetta.launchpad.net), or Pootle
(http://pootle.wordforge.org) or Emacs, or POEdit
(http://www.poedit.org) or whatever other tool they want to fiddle with
the translation. In fact, once POs are produced, the whole translation
shebang is Not-Your-Problem-Anymore: the translators will simply know
what to do. We've translated GNOME (http://www.prevod.org), translated
KDE (http://www.kde.org.yu), so the machine is well-oiled and will
process this Yet-Another-Gettext-Translation without problems.
- let the translators handle the rest, and maybe indulge them a little
by opening i18n.tw.o where they can learn from each other.
- CVS is IMHO not a problem. Once you've the PO repositories, you can
designate someone or something to accept the POs via mail and install
them into the lang/ dir. More important is that it's possible to have a
groupware to support collaborative translation, and with Rosetta or
Pootle in place, the translators are all set.
This is as far as it can go with the current toolset. It cannot get any