Re: [Translate-devel] pofilter comments (was: New team for Aragonese)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

F Wolff <fri...@pu...>, Wed, 21 Apr 2010
00:06:02 +0200:

> Op Di, 2010-04-20 om 22:02 +0200 skryf Petr Kovar:

(...)

> > Since I use pofilter extensively when working on translations (and,
> > needless to say, I'm very grateful for such tool), I usually spot some
> > false positives that are pretty similar to each other in their nature.
> > Let me list examples & comments below (all strings come from GNOME
> > software while using current Translate Toolkit on Fedora).
> 
> Yes, there are always the issue of false positives. Ideally we should
> use two approaches to solving it:
>  - We keep improving the accuracy and relevancy of the tests
>  - We allow our users to mark false positives. For example, an upcoming
> version of Pootle might have support for actually marking these so that
> you won't need to review it again.  I have an idea in mind for how to do
> that with pofilter as well.

That sounds really interesting, indeed!

> > # (pofilter) endpunc: checks whether punctuation at the end of the
> > # strings match
> > #: ../src/Clients/Classical/mono-addins-strings.xml.h:10
> > msgid "Browse..."
> > msgstr "Procházet…"
> > 
> > In this case, translator has replaced the usual ASCII representation of
> > ellipsis (...) with the more correct one (…), see also:
> > 
> > http://en.wikipedia.org/wiki/%E2%80%A6
> > 
> > I assume this should be understood by pofilter while checking not only
> > Czech and Slovak, but also, e.g., English variants, and many other
> > Germanic, Romance or Slavic languages where the very similar rules also
> > apply.
> 
> It probably easy to have such a rule if we agree that it is correct.
> The issue here is that the ellips character (…) might not enjoy very
> good font support, so I tend to use the style of the source text under
> the assumption that the programmer could have used the ellips if it was
> wanted (which might not be a good assumption, of course :-).

Well, in fact, I often do the same when translating, nevertheless,
considering the various languages, various typographic conventions across
many different languages and writing systems, we can apparently not
presume, to a certain extent, that the original English style is somewhat
mandatory, acceptable & suitable in our target language. That has nothing
to do with what programmer could know, understand, or even anticipate when
writing the original string with ellipsis, be it "..." or "…". It's up to
the translator to decide what exactly is more suitable in one's l10n.

More than that, large numbers of languages simply (and incorrectly, at
least from a formal point of view) adapt their conventions to the English
pseudo-standard due to largely historical reasons: first, we had
typewriters, then terminals with obviously poor font and/or character
support. In the Unicode times, however, I think these are mostly obsolete
issues. But the usage prevails, yes.

> I know there is a drive in GNOME to use typographically correct
> punctuation, and that is a good thing. Mozilla moved there at the time
> of Firefox 3 if I remember correctly. I'm just not sure that we should
> mandate it in all localisation.

Well, first of all, there's definitely a strong drive in many l10n practices
incl. those of Czech to use better typographical characters. That's why I
wrote about this in the first place, since I run across the problem with
non-ASCII special characters very often when reviewing works of other
translators who simply adopted the more correct conventions, and as a
reviewer, I have to deal with that some way. Given the number of false
positives, surely, in a sense, it'd be enough to just disable the said
check, but I try to seek other, more reasonable solutions. :-)

> An alternative would be to accept either "..." or "…" for a translation
> of "..." at the expense of a bit of processing time. This might devalue
> the test for anyone who wants to keep closely to the source text.  I'm
> not sure what is the best.

Personally, I'd prefer to have the "…" accepted. In other translators'
experiences, it may be different, no doubt. Then again, what about a
configurable option for that, to give user a way to configure this
special character acceptance according to one's l10n needs... ?

> > # (pofilter) startpunc: checks whether punctuation at the beginning of
> > # the strings match
> > #: ../kupfer/commandexec.py:217
> > #, python-format
> > msgid "\"%s\" produced a result"
> > msgstr "„%s“ vyprodukoval výsledek"
> > 
> > This is somewhat related to the first example, in that translator has
> > changed the ASCII representation for quotation marks in English to those
> > that are more correct in Czech, Slovak and some other languages ("
> > changed to „ at the beginning of the quotation, and to “ at the end,
> > respectively). See the Wikipedia once again for a nice overview:
> > 
> > http://en.wikipedia.org/wiki/Quotation_mark,_non-English_usage
> > 
> > Sometimes, developers use non-ASCII quotation marks in the source, so
> > then the source representation would be “quote”, and ‘single quote’,
> > respectively.
> 
> If the use of these quotation marks are universally accepted in the
> language, it is reasonably easy to ensure that pofilter expects them.
> That way copying from source to target in Virtaal will also do this
> substitution for you.  For French we already copy your example as
> « %s » produced a result
> to the target and expect this punctuation style in pofilter. In other
> words, using the English style would be marked as an error.
> 
> This is implemented for French and Vietnamese. We can do something
> similar for other languages in cases where this style is an official
> language policy, rather than a stylistic choice of some localisers.

Good, though not sure what to do in a situation where, in contrast to
the official language policy, many translators prefer the old ASCII-only
style for a number of more or less personal reasons...

But speaking of „quotation marks“ in Czech or Slovak, yes, it's definitely
the official language policy to use them, quotation marks are even
auto-corrected by default in text processors like Word or Writer, though
ASCII quoting style is widely used in some contexts (email, messaging,
etc.).

Cf. the Wikipedia article I already linked on that. Also, I can provide you
with some links to more authoritative sources, if desired, however, those
are written in local language.

(...)

> > # (pofilter) variables: (u'translation contains variables not in
> > # original: % d',), (u'translation contains variables not in original: %
> > # d',) (pofilter) printf: checks whether printf format strings match
> > #: ../gedit/gedit-statusbar.c:431
> > #, c-format
> > msgid "There is a tab with errors"
> > msgid_plural "There are %d tabs with errors"
> > msgstr[0] "V kartě je chyba"
> > msgstr[1] "Ve %d kartách jsou chyby"
> > msgstr[2] "V %d kartách jsou chyby"
> > 
> > OK, somewhat different, yet another example of the printf (and variable)
> > check.
> 
> Yes. Plural units sometimes give more false positives. The question we
> need a better answer to is "What source should each target be compared
> to?"  At the moment we compare each plural form to the first source
> string (msgid, not msgid_plural).  So you can see why it complains if
> the msgid and msgid_plural don't have the same variables.
> 
> I don't know of a universal way for all languages to know what to do
> here.  For you case it seems we want:
> msgid        -> msgstr[0]
> msgid_plural -> msgstr[1]
> msgid_plural -> msgstr[2]
> 
> Is that correct? How universal is this, even for languages with only 3
> forms?

That's correct; I'd assume this is the case in pretty much every situation
you can encounter when working with languages with the said 3 plural forms.
Or at least as far as I can tell from my l10n experiences, it's safe to
make such assumption. I somewhat doubt there may be a case in which msgid
and msgstr[x], where x is above 0, should be compared to each other, that
is, for value of 1, we compare msgid with msgstr[0], and then for values
above 1, in languages that distinguish plural from singular, we always
compare msgid_plural with msgstr[x].

> Alexander did some improvements to the printf test to try to handle
> plural issues better. I think some of those ideas can maybe be useful
> here.
> 
> > # (pofilter) doublewords: The word 'g' is repeated
> > #: ../src/gnumch-equality-activity/gnumch.py:783
> > msgid ""
> > "T\n"
> > "R\n"
> > "O\n"
> > "G\n"
> > "G\n"
> > "L\n"
> > "E"
> > msgstr ""
> > "T\n"
> > "R\n"
> > "O\n"
> > "G\n"
> > "G\n"
> > "L\n"
> > "E"
> > 
> > Here, well, it doesn't make much sense to me to raise the doublewords
> > warning if the source as well as target repeats the character, word,
> > fragment, or whatever...
> 
> I agree in principle, and it should be doable. How frequently do you get
> something like this?

Well, too often, mainly with non-complex strings like the one above, so it
was worth mentioning here. :-)

> > So I hope this might of some use to the Translate Toolkit developers,
> > and please, keep up the good work, I really appreciate that!
> > 
> > On a side note: sometime soon, I'll be finishing series of Czech l10n
> > guidelines articles for a local technology website, and I certainly
> > intend to stress out how important the technical QA in the l10n process
> > is, and so why people working on translations should use tools like
> > Translate Toolkit much more often.
> 
> Thanks. That will be great!  Please share a link if you can.  Do you

I will. :-)

> think you will be able to localise Virtaal for the occasion, maybe?
> Depending on when this is published, Virtaal might already expose the
> quality checks :-)

Frankly, the Virtaal l10n has been on my ever-growing to-do list for some
time. Too bad I can't say now when exactly I'll be able to get down to
it... :-(

> Pootle already exposes the quality checks, so that might also be a way
> of visualising it.
> 
> I can add Virtaal for you here if you are interested:
> http://pootle.locamotion.org/cs/

I myself seem to prefer working with more traditional, desktop-based tools,
though I fully respect & appreciate the need for web-based solutions, for
what it's worth, I've multiple PO editors always installed on my system, for
research purposes, if not for anything else. :-)

But I see the Pootle itself is being localized there on
pootle.locamotion.org, right? So I could at least investigate the quality
of Pootle translation right there, also because Czech translators there
seem to be rather unorganized. Completely off-topic, but since I co-maintain
a common place for Czech FLOSS l10n community at l10n.cz, organizing the
l10n work is a big topic for me. :-)

Thanks,
Petr Kovar

Re: [Translate-devel] pofilter comments (was: New team for Aragonese)

Localization tools built by localizers for localizers

Re: [Translate-devel] pofilter comments (was: New team for Aragonese)