Thread: [Tuxpaint-i18n] Aa, qx, QX, qy, QY and other strings
An award-winning drawing program for children of all ages
Brought to you by:
wkendrick
From: Bill K. <nb...@so...> - 2009-06-01 17:16:53
|
NOTE: I'm not an expert on this. Hopefully Albert can point out any mistakes. ;) For a while now, Tux Paint has had a feature whereby it tries to remove or re-order fonts such that the most useful ones (to users of the current locale) are presented first. In other words, if a font doesn't have the characters necessary to type in the current locale's language, we remove (or at least deemphasize) that font. This is done using some special translatable strings: qx QX qy QY oO `\%_@$~#{}<>^&* ,.?! 017 O0 1Il| It seems many translators were unaware of this, particularly because, for a very long time, there was nothing mentioned about this in the PO files themselves (i.e., in the comments that appear above the strings... and GUI editors I've seen, like poEdit, show these comments). It seems, however, that many locales still don't have these strings translated, or simply repeat the original string (e.g., "qx"). The tests that we do, in order to 'score' a font, are as follows: * Does it have both uppercase and lowecase letters (if that makes sense in the locale). We test this in English by seeing if "qx" and "QX", and "qy" and "QY" both render. In your locale, it'd be helpful to translate these to uppercase and lowercase characters common to your locale. * If the locale cannot support ASCII characters, both the "qx"/"QX" and "qy"/"QY" pairs need to be translated to something in the local langage. (That way, fonts that don't support your locale will be filtered out.) * If the locale does support ASCII, then only translate the "qx"/"QX" lines. LEAVE THE "qy"/"QY" ones untranslated (or simply enter "qy" and "QY" for them as the translations). (This way, fonts that don't support your locale, but are still useful because your locale supports ASCII, will remain.) * We gather a score for a font based on whether it supports a variety of strings. For your locale, translate the following into whatever makes sense: oO - Test whether uppercase and lowercase characters work (it's ok if it does not, but is scored lower). `\%_@$~#{}<>^&* - "Uncommon" punctuation. (In European locales, you might want to check for the Euro symbol too, for example.) ,.?! - Common punctuation. (In Spanish locales, for example, you'd want the upside-down ? and ! ) 017 - Digits. (Honestly, I'm not sure how one would localize this.) O0 - Distinct circle-like characters. (I admit I don't understand how scoring actually applies to this test. Albert?) 1Il| - Distinct line-like characters. (Ditto) Additionally, Albert gives a score bonus to fonts that include the multiply and divide symbols, when Tux Paint is built for the OLPC XO-1 laptop (it includes a separate key for those characters, unlike US keyboards.) (That string is not translatable.) So, I encourage translators to go into their PO file, look for the strings above, and make sure they're translated to something suitable for your locale. ... and to ask questions. :) A quick way to get an idea of how these things have been translated in other locales is to use the following chain of 'grep' commands. (It helps if you have a POSIX-like environment, like Linux, Mac OSX, or Cygwin on Windows.) grep -C 2 "qx" po/*.po | grep msgstr | grep -v "qx" | grep -v \"\" grep -C 2 "QX" po/*.po | grep msgstr | grep -v "QX" | grep -v \"\" # a number of locales translate, but only a fraction of all locales grep -C 2 "qy" po/*.po | grep msgstr | grep -v "qy" | grep -v \"\" grep -C 2 "QY" po/*.po | grep msgstr | grep -v "QY" | grep -v \"\" # norwegian locales translate this -- not sure if that's appropriate...? grep -C 2 "oO" po/*.po | grep msgstr | grep -v "oO" | grep -v \"\" # swedish checks for a variety of accented chars. # korean checks for a pair of korean chars. grep -C 2 "%_" po/*.po | grep msgstr | grep -v "%_" | grep -v \"\" # no translations grep -C 2 ",\.?\!" po/*.po | grep msgstr | grep -v \"\" | grep -v ",\.?\!" # only arabic checks (for right-to-left variations of these chars) # I think spanish should check for upside-down ? and ! # I think many should check for their quote characters (e.g., French) grep -C 2 "017" po/*.po | grep msgstr | grep -v \"\" | grep -v "017" # gujarati checks (has its own set of digits) ... and similar for the other strings. (But, again, I don't _quite_ understand their use.) I'd like figure this all out in a way that's easy to explain to translators, so we can document it more clearly. In the meantime, I think I've covered a bunch, and it should help a bit. Thanks! -- -bill! Sent from my computer |
From: Mark K. K. <mkk...@gm...> - 2009-06-01 21:30:37
|
On Mon, Jun 01, 2009 at 10:16:35AM -0700, Bill Kendrick wrote: > For a while now, Tux Paint has had a feature whereby it tries to > remove or re-order fonts such that the most useful ones (to users of the > current locale) are presented first. Is this working now? It used to be broken. (The code was excuted before the locale was loaded, and this code needed to execute before the fonts were loaded, but there was a reason why the font loading couldn't be moved to after the locale was loaded for reasons I cannot recall.) -Mark |
From: Bill K. <nb...@so...> - 2009-06-01 21:49:24
|
On Mon, Jun 01, 2009 at 02:30:34PM -0700, Mark K. Kim wrote: > Is this working now? It used to be broken. (The code was excuted > before the locale was loaded, and this code needed to execute before the > fonts were loaded, but there was a reason why the font loading couldn't > be moved to after the locale was loaded for reasons I cannot recall.) That was fixed, yes. I'm not sure if it works 100% properly, but better, at least. ;) -bill! |
From: Albert C. <aca...@gm...> - 2009-06-02 07:58:09
|
On Mon, Jun 1, 2009 at 5:30 PM, Mark K. Kim <mkk...@gm...> wrote: > On Mon, Jun 01, 2009 at 10:16:35AM -0700, Bill Kendrick wrote: > >> For a while now, Tux Paint has had a feature whereby it tries to >> remove or re-order fonts such that the most useful ones (to users of the >> current locale) are presented first. > > Is this working now? It used to be broken. (The code was excuted > before the locale was loaded, and this code needed to execute before the > fonts were loaded, but there was a reason why the font loading couldn't > be moved to after the locale was loaded for reasons I cannot recall.) To reduce start-up latency and memory usage, the font process needs to fork off very early. The earlier it happens, the better. To avoid SDL problems, it must happen before SDL_Init. The locale can't be loaded until the options are parsed. Because of this, and because of the --nosysfonts option, option parsing must be done before the font process does any work. The font process could still fork off at the beginning of main(), but then the locale would need to be communicated to it. |
From: Bill K. <nb...@so...> - 2009-06-02 20:42:52
|
On Tue, Jun 02, 2009 at 03:58:02AM -0400, Albert Cahalan wrote: > To reduce start-up latency and memory usage, the font process > needs to fork off very early. The earlier it happens, the better. > To avoid SDL problems, it must happen before SDL_Init. > > The locale can't be loaded until the options are parsed. Because of > this, and because of the --nosysfonts option, option parsing must be > done before the font process does any work. The font process could > still fork off at the beginning of main(), but then the locale would need > to be communicated to it. Currently looks to work like this: * set default options * get default savedir and data directories * load config file options * parse command-line options * i18n stuff * input method stuff * SDL_Pango_Init() * run_font_scanner() * SDL_Init() * Mix_OpenAudio() * load color palette * open window (SDL_SetVideoMode()) * load splash screen images * load default font * display splash screen * #ifdef FORKED_FONTS reliable_write(font_socket_fd, &no_system_fonts, sizeof no_system_fonts); #else font_thread = SDL_CreateThread(load_user_fonts_stub, NULL); #endif * load cursor shapes * ... etc. etc. Looks reasonable, based on the needs you outlined above. Am I wrong? -- -bill! Sent from my computer |
From: Albert C. <aca...@gm...> - 2009-06-02 09:14:07
|
On Mon, Jun 1, 2009 at 1:16 PM, Bill Kendrick <nb...@so...> wrote: > For a while now, Tux Paint has had a feature whereby it tries to > remove or re-order fonts such that the most useful ones (to users of the > current locale) are presented first. > > In other words, if a font doesn't have the characters necessary to type > in the current locale's language, we remove (or at least deemphasize) that > font. This is especially important when scrolling is disabled. In that case, only the topmost fonts may be used. > * Does it have both uppercase and lowecase letters > (if that makes sense in the locale). > > We test this in English by seeing if "qx" and "QX", and "qy" and "QY" > both render. In your locale, it'd be helpful to translate these to > uppercase and lowercase characters common to your locale. > > * If the locale cannot support ASCII characters, both the "qx"/"QX" > and "qy"/"QY" pairs need to be translated to something in the local langage. > (That way, fonts that don't support your locale will be filtered out.) > > * If the locale does support ASCII, then only translate the "qx"/"QX" lines. > LEAVE THE "qy"/"QY" ones untranslated (or simply enter "qy" and "QY" > for them as the translations). > (This way, fonts that don't support your locale, but are still useful > because your locale supports ASCII, will remain.) Yes. This should work OK. Perhaps it would be better to split the ASCII and non-ASCII apart, then flag languages according to how much they value ASCII. Oh well; the current code seems to do a decent job. > * We gather a score for a font based on whether it supports a variety of > strings. For your locale, translate the following into whatever makes > sense: > > oO - Test whether uppercase and lowercase characters work > (it's ok if it does not, but is scored lower). > > `\%_@$~#{}<>^&* - "Uncommon" punctuation. (In European locales, > you might want to check for the Euro symbol too, > for example.) This is stuff you could live without in a novelty font. It's commonly missing. > ,.?! - Common punctuation. (In Spanish locales, for example, > you'd want the upside-down ? and ! ) This is really critical for using the font. > 017 - Digits. (Honestly, I'm not sure how one would localize this.) Some particularly lame novelty fonts lack the digits. Some languages do not use ASCII digits. BTW, in case digits show up somewhere in the UI, glibc can translate them if you use the "I" (upper case eye) modifier. Like this: "%Id" > O0 - Distinct circle-like characters. (I admit I don't understand how > scoring actually applies to this test. Albert?) > > 1Il| - Distinct line-like characters. (Ditto) This is to prefer fonts with distinct characters. It's confusing if you can't tell the difference. It's not so easy to explain why the computer has all these symbols if they all look the same. In general, indistinct characters is a sign of a poor font. > grep -C 2 "qx" po/*.po | grep msgstr | grep -v "qx" | grep -v \"\" > grep -C 2 "QX" po/*.po | grep msgstr | grep -v "QX" | grep -v \"\" > # a number of locales translate, but only a fraction of all locales > > grep -C 2 "qy" po/*.po | grep msgstr | grep -v "qy" | grep -v \"\" > grep -C 2 "QY" po/*.po | grep msgstr | grep -v "QY" | grep -v \"\" > # norwegian locales translate this -- not sure if that's appropriate...? I think it's an error, because plain ASCII is slightly useful. Norwegian probably should translate one pair of these only. Translate both whenever plain ASCII fonts are of zero value. > grep -C 2 "oO" po/*.po | grep msgstr | grep -v "oO" | grep -v \"\" > # swedish checks for a variety of accented chars. > # korean checks for a pair of korean chars. Probably this isn't the best. The code might be improved by having distinct non-translatable test strings for ASCII, and a way to indicate the importance of ASCII. Swedish loses the ability to prefer ASCII-only fonts with case distinction over ASCII-only fonts that lack it. It gains the ability to distinguish between fonts that lack case distinction for Swedish accented letters. It's pretty unlikely that a font would have case distinction for ASCII but not also for any accented characters. In other words, testing ASCII is highly likely to take care of accented characters as well. I have no idea what the Korean translation is doing. In my xterm those characters look like spaces. It'd be reasonable to not use gettext on this string. Translation is only really useful if all of these apply: 1. the language does not use the Latin alphabet 2. the alphabet has a case distinction 3. a font fails to provide the case distinction 4. a font fails to provide ASCII (and was unblacklisted) For example, suppose that there are two Cyrillic fonts which completely lack the ASCII letters. If one of those fonts is also lacking case distinction, then it should be scored lower than the other. If this situation really does exist, then translation could be useful. The same goes for Greek. Note that this situation can not exist unless the translator also disables the blacklisting of fonts that lack ASCII letters. (by translating "qx", "QX", "qy", and "QY") > grep -C 2 "%_" po/*.po | grep msgstr | grep -v "%_" | grep -v \"\" > # no translations > > grep -C 2 ",\.?\!" po/*.po | grep msgstr | grep -v \"\" | grep -v ",\.?\!" > # only arabic checks (for right-to-left variations of these chars) > # I think spanish should check for upside-down ? and ! > # I think many should check for their quote characters (e.g., French) I think that Arabic is doing the right thing. Other languages using non-Latin end-of-sentence punctuation ought to do this. Given what little I know about Spanish, I agree with you. If the upside-down '?' is really critical for tolerable sentences, then it should be checked. Quote characters are not critical. If the French translator is having score problems related to them, then they might best be added to the low-priority punctuation string. Quote characters are far less important than things like the period and question mark. > grep -C 2 "017" po/*.po | grep msgstr | grep -v \"\" | grep -v "017" > # gujarati checks (has its own set of digits) Farsi and Arabic do too, AFAIK. > ... and similar for the other strings. (But, again, I don't _quite_ understand > their use.) The O0 string appears to be wrongly translated in all cases except po/gu.po-msgstr, which simply suppresses the unused ASCII digit. None of the other translations should exist. I think some East Asian languages have an end-of-sentence character that might need to be added on to the end of "O0". Any sort of large circle character should be in the string. (and likewise for the vertical line characters) |
From: Bill K. <nb...@so...> - 2009-06-02 20:36:12
|
On Tue, Jun 02, 2009 at 05:14:06AM -0400, Albert Cahalan wrote: > This is especially important when scrolling is disabled. > In that case, only the topmost fonts may be used. I don't believe we have any option to disable scrolling of the selector toolbar. <snip> > This should work OK. Perhaps it would be better to split the > ASCII and non-ASCII apart, then flag languages according to > how much they value ASCII. Oh well; the current code seems > to do a decent job. Though extremely underutilized, from what I can tell. But that's part of the reason I'm bringing this topic up. :) > > `\%_@$~#{}<>^&* - "Uncommon" punctuation. (In European locales, <snip> > This is stuff you could live without in a novelty font. > It's commonly missing. > > > ,.?! - Common punctuation. (In Spanish locales, for example, <snip> > This is really critical for using the font. It seems both are scored the same (one point). Should we increase the 'weight' of certain tests? (e.g., if it lacks "\" or "%" that's much less important than if it lacks "." or "?".) <snip> > Some languages do not use ASCII digits. BTW, in case digits > show up somewhere in the UI, glibc can translate them if you > use the "I" (upper case eye) modifier. Like this: "%Id" Not from what I've seen doing a grep of the POT file. But good to know!!! > > O0 - Distinct circle-like characters. (I admit I don't understand how > > scoring actually applies to this test. Albert?) > > > > 1Il| - Distinct line-like characters. (Ditto) > > This is to prefer fonts with distinct characters. It's confusing > if you can't tell the difference. It's not so easy to explain why > the computer has all these symbols if they all look the same. > In general, indistinct characters is a sign of a poor font. So out of curiosity, how does it actually _test_ this? Compare the bitmaps generated by each chracter...??? > > grep -C 2 "qx" po/*.po | grep msgstr | grep -v "qx" | grep -v \"\" > > grep -C 2 "QX" po/*.po | grep msgstr | grep -v "QX" | grep -v \"\" > > # a number of locales translate, but only a fraction of all locales > > > > grep -C 2 "qy" po/*.po | grep msgstr | grep -v "qy" | grep -v \"\" > > grep -C 2 "QY" po/*.po | grep msgstr | grep -v "QY" | grep -v \"\" > > # norwegian locales translate this -- not sure if that's appropriate...? > > I think it's an error, because plain ASCII is slightly useful. > Norwegian probably should translate one pair of these only. > > Translate both whenever plain ASCII fonts are of zero value In terms of 'completeness' of translations (i.e., so that a particular locale doesn't sit at "99% finished" forever), we should just 'translate' the string to the same string. i.e., in the nn.po and nb.po, the translations of "qy" and "QY" would be "qy" and "QY", respectively. :) > > grep -C 2 "oO" po/*.po | grep msgstr | grep -v "oO" | grep -v \"\" > > # swedish checks for a variety of accented chars. > > # korean checks for a pair of korean chars. > > Probably this isn't the best. The code might be improved > by having distinct non-translatable test strings for ASCII, > and a way to indicate the importance of ASCII. Not 100% understanding, but I can see why you'd want to score a Korean font with distinct upper/lowercase higher than a Korean font with only uppercase Korean characters. No? > Swedish loses the ability to prefer ASCII-only fonts > with case distinction over ASCII-only fonts that lack it. > It gains the ability to distinguish between fonts that > lack case distinction for Swedish accented letters. > > It's pretty unlikely that a font would have case distinction > for ASCII but not also for any accented characters. > In other words, testing ASCII is highly likely to take care > of accented characters as well. > > I have no idea what the Korean translation is doing. > In my xterm those characters look like spaces. > > It'd be reasonable to not use gettext on this string. > Translation is only really useful if all of these apply: > > 1. the language does not use the Latin alphabet > 2. the alphabet has a case distinction > 3. a font fails to provide the case distinction > 4. a font fails to provide ASCII (and was unblacklisted) > > For example, suppose that there are two Cyrillic fonts > which completely lack the ASCII letters. If one of those > fonts is also lacking case distinction, then it should be > scored lower than the other. If this situation really does > exist, then translation could be useful. The same goes > for Greek. Note that this situation can not exist unless > the translator also disables the blacklisting of fonts that > lack ASCII letters. (by translating "qx", "QX", "qy", and "QY") > > > grep -C 2 "%_" po/*.po | grep msgstr | grep -v "%_" | grep -v \"\" > > # no translations > > > > grep -C 2 ",\.?\!" po/*.po | grep msgstr | grep -v \"\" | grep -v ",\.?\!" > > # only arabic checks (for right-to-left variations of these chars) > > # I think spanish should check for upside-down ? and ! > > # I think many should check for their quote characters (e.g., French) > > I think that Arabic is doing the right thing. Other languages > using non-Latin end-of-sentence punctuation ought to do this. > > Given what little I know about Spanish, I agree with you. > If the upside-down '?' is really critical for tolerable sentences, > then it should be checked. > > Quote characters are not critical. If the French translator > is having score problems related to them, then they might > best be added to the low-priority punctuation string. > Quote characters are far less important than things like > the period and question mark. > > > grep -C 2 "017" po/*.po | grep msgstr | grep -v \"\" | grep -v "017" > > # gujarati checks (has its own set of digits) > > Farsi and Arabic do too, AFAIK. > > > ... and similar for the other strings. (But, again, I don't _quite_ understand > > their use.) > > The O0 string appears to be wrongly translated in all cases > except po/gu.po-msgstr, which simply suppresses the unused > ASCII digit. None of the other translations should exist. Totally confused here. Can you elucidate? > I think some East Asian languages have an end-of-sentence > character that might need to be added on to the end of "O0". > Any sort of large circle character should be in the string. > (and likewise for the vertical line characters) Thanks for the tips. Expect more questions from myself and translators, I'm sure. ;) -- -bill! Sent from my computer |
From: Albert C. <aca...@gm...> - 2009-06-03 07:41:53
|
On Tue, Jun 2, 2009 at 4:36 PM, Bill Kendrick <nb...@so...> wrote: > On Tue, Jun 02, 2009 at 05:14:06AM -0400, Albert Cahalan wrote: >> > `\%_@$~#{}<>^&* - "Uncommon" punctuation. (In European locales, > <snip> >> This is stuff you could live without in a novelty font. >> It's commonly missing. >> >> > ,.?! - Common punctuation. (In Spanish locales, for example, > <snip> >> This is really critical for using the font. > > It seems both are scored the same (one point). Should we increase the > 'weight' of certain tests? (e.g., if it lacks "\" or "%" that's much > less important than if it lacks "." or "?".) I assume that a font with all the uncommon punctuation will have the common punctuation. Thus, in practice, there are three possible score increases: 0: does not have basic puctuation 1: has basic punctuation only, nothing fancy 2: has full punctuation In theory, there could be a font with the uncommon punctuation but lacking the common punctuation. I've never seen such a font. It would gets an increase of 1. >> > O0 - Distinct circle-like characters. (I admit I don't understand how >> > scoring actually applies to this test. Albert?) >> > >> > 1Il| - Distinct line-like characters. (Ditto) >> >> This is to prefer fonts with distinct characters. It's confusing >> if you can't tell the difference. It's not so easy to explain why >> the computer has all these symbols if they all look the same. >> In general, indistinct characters is a sign of a poor font. > > So out of curiosity, how does it actually _test_ this? Compare the > bitmaps generated by each chracter...??? Yes. charset_works calls qsort qsort calls surfcmp surfcmp calls do_surfcmp do_surfcmp calls memcmp >> > grep -C 2 "qx" po/*.po | grep msgstr | grep -v "qx" | grep -v \"\" >> > grep -C 2 "QX" po/*.po | grep msgstr | grep -v "QX" | grep -v \"\" >> > # a number of locales translate, but only a fraction of all locales >> > >> > grep -C 2 "qy" po/*.po | grep msgstr | grep -v "qy" | grep -v \"\" >> > grep -C 2 "QY" po/*.po | grep msgstr | grep -v "QY" | grep -v \"\" >> > # norwegian locales translate this -- not sure if that's appropriate...? >> >> I think it's an error, because plain ASCII is slightly useful. >> Norwegian probably should translate one pair of these only. >> >> Translate both whenever plain ASCII fonts are of zero value Sorry: translate both when fonts w/o ASCII are OK. These aren't scoring; they are blacklisting. Fonts that make "Q" look the same as "X" are too broken to keep. Translate only "QY" and "qy" (or only "QX" and "qx") if you want to **require** some characters to work OK. >> > grep -C 2 "oO" po/*.po | grep msgstr | grep -v "oO" | grep -v \"\" >> > # swedish checks for a variety of accented chars. >> > # korean checks for a pair of korean chars. >> >> Probably this isn't the best. The code might be improved >> by having distinct non-translatable test strings for ASCII, >> and a way to indicate the importance of ASCII. > > Not 100% understanding, but I can see why you'd want to score a > Korean font with distinct upper/lowercase higher than a Korean font with > only uppercase Korean characters. No? AFAIK, there is no concept of uppercase in Korean. (nor in Chinese, nor in Japanese, nor in Arabic...) >> The O0 string appears to be wrongly translated in all cases >> except po/gu.po-msgstr, which simply suppresses the unused >> ASCII digit. None of the other translations should exist. > > Totally confused here. Can you elucidate? All useful circle-shaped characters should appear in this. Perhaps this need not be translatable. |
From: Pere P. i C. <pe...@fo...> - 2009-06-02 22:14:28
|
El dt 02 de 06 de 2009 a les 05:14 -0400, en/na Albert Cahalan va escriure: > On Mon, Jun 1, 2009 at 1:16 PM, Bill Kendrick <nb...@so...> wrote: > > > For a while now, Tux Paint has had a feature whereby it tries to > > remove or re-order fonts such that the most useful ones (to users of the > > current locale) are presented first. > > > > In other words, if a font doesn't have the characters necessary to type > > in the current locale's language, we remove (or at least deemphasize) that > > font. > > This is especially important when scrolling is disabled. > In that case, only the topmost fonts may be used. > > > * Does it have both uppercase and lowecase letters > > (if that makes sense in the locale). > > > > We test this in English by seeing if "qx" and "QX", and "qy" and "QY" > > both render. In your locale, it'd be helpful to translate these to > > uppercase and lowercase characters common to your locale. > > In catalan we need the ç (ccedill) and some accented letters. Should I should put all of them here or a selected subset will be enouth? Also, we should be able to type spanish (ntilde, exclamup...) and french (æ ae, oe) letters, as they are not so important for writing catalan, they have to go to uncommon punctuation below, right? > > * If the locale cannot support ASCII characters, both the "qx"/"QX" > > and "qy"/"QY" pairs need to be translated to something in the local langage. > > (That way, fonts that don't support your locale will be filtered out.) > > > > * If the locale does support ASCII, then only translate the "qx"/"QX" lines. > > LEAVE THE "qy"/"QY" ones untranslated (or simply enter "qy" and "QY" > > for them as the translations). > > (This way, fonts that don't support your locale, but are still useful > > because your locale supports ASCII, will remain.) > > Yes. > > This should work OK. Perhaps it would be better to split the > ASCII and non-ASCII apart, then flag languages according to > how much they value ASCII. Oh well; the current code seems > to do a decent job. > > > * We gather a score for a font based on whether it supports a variety of > > strings. For your locale, translate the following into whatever makes > > sense: > > > > oO - Test whether uppercase and lowercase characters work > > (it's ok if it does not, but is scored lower). > > I am lost here, if we have yet checked for QX (uppercase) and qx (lowercase), what we are suposed to put here? Oh wait, I see below in your comment about Sweddish. > > `\%_@$~#{}<>^&* - "Uncommon" punctuation. (In European locales, > > you might want to check for the Euro symbol too, > > for example.) > This is stuff you could live without in a novelty font. > It's commonly missing. > Here have to go the spanish and french letters and symbols plus euro symbol plus some other uncommon punctuation, so this applies to 'letters, punctuation and symbols', not only punctuation, right? Also the same question as above, is enouth a subset of them? > > > ,.?! - Common punctuation. (In Spanish locales, for example, > > you'd want the upside-down ? and ! ) > > This is really critical for using the font. > So in catalan I have to add "·" middot, "'" apostroph and "-" (minus?) as they are really needed. > > 017 - Digits. (Honestly, I'm not sure how one would localize this.) > > Some particularly lame novelty fonts lack the digits. > > Some languages do not use ASCII digits. BTW, in case digits > show up somewhere in the UI, glibc can translate them if you > use the "I" (upper case eye) modifier. Like this: "%Id" > > > O0 - Distinct circle-like characters. (I admit I don't understand how > > scoring actually applies to this test. Albert?) > > > > 1Il| - Distinct line-like characters. (Ditto) > > This is to prefer fonts with distinct characters. It's confusing > if you can't tell the difference. It's not so easy to explain why > the computer has all these symbols if they all look the same. > In general, indistinct characters is a sign of a poor font. > 1Il| Should I add here "i¡" (lower aye and exclamup)? > > grep -C 2 "qx" po/*.po | grep msgstr | grep -v "qx" | grep -v \"\" > > grep -C 2 "QX" po/*.po | grep msgstr | grep -v "QX" | grep -v \"\" > > # a number of locales translate, but only a fraction of all locales > > > > grep -C 2 "qy" po/*.po | grep msgstr | grep -v "qy" | grep -v \"\" > > grep -C 2 "QY" po/*.po | grep msgstr | grep -v "QY" | grep -v \"\" > > # norwegian locales translate this -- not sure if that's appropriate...? > > I think it's an error, because plain ASCII is slightly useful. > Norwegian probably should translate one pair of these only. > > Translate both whenever plain ASCII fonts are of zero value. > > > grep -C 2 "oO" po/*.po | grep msgstr | grep -v "oO" | grep -v \"\" > > # swedish checks for a variety of accented chars. > > # korean checks for a pair of korean chars. > > Probably this isn't the best. The code might be improved > by having distinct non-translatable test strings for ASCII, > and a way to indicate the importance of ASCII. > > Swedish loses the ability to prefer ASCII-only fonts > with case distinction over ASCII-only fonts that lack it. > It gains the ability to distinguish between fonts that > lack case distinction for Swedish accented letters. > > It's pretty unlikely that a font would have case distinction > for ASCII but not also for any accented characters. > In other words, testing ASCII is highly likely to take care > of accented characters as well. > I've seen some fonts that have the same gliph for both accented and not accented letters. Also, I've seen many fonts that have different gliphs for lowercase accented/not accented letters, but don't diferentiate capital accented letters from its corresponding capital not accented one, so maybe the Swedish choice is not so bad. Thanks for any comments Pere |
From: Albert C. <al...@us...> - 2009-06-03 08:07:51
|
On Tue, Jun 2, 2009 at 6:13 PM, Pere Pujal i Carabantes <pe...@fo...> wrote: > El dt 02 de 06 de 2009 a les 05:14 -0400, en/na Albert Cahalan va > escriure: >> On Mon, Jun 1, 2009 at 1:16 PM, Bill Kendrick <nb...@so...> wrote: >> > We test this in English by seeing if "qx" and "QX", and "qy" and "QY" >> > both render. In your locale, it'd be helpful to translate these to >> > uppercase and lowercase characters common to your locale. >> > > In catalan we need the ç (ccedill) and some accented letters. > Should I should put all of them here or a selected subset will be > enouth? If you really want to THROW AWAY any fonts that lack these accented letters, then translate some of these strings. Use a selected subset for better performance. If all of the letters are in Latin-1, then they probably always appear as a group. Pick any two. In general, you should pick the ones most likely to be a problem. > Also, we should be able to type spanish (ntilde, exclamup...) and > french (æ ae, oe) letters, as they are not so important for writing > catalan, they have to go to uncommon punctuation below, right? They could. I'm starting to think that you need a few spare translatable strings for scoring; I gave you 4 strings for blacklisting but you probably only need one. (call it "Qx", and you append a pair of your extra characters) >> > `\%_@$~#{}<>^&* - "Uncommon" punctuation. (In European locales, >> > you might want to check for the Euro symbol too, >> > for example.) > >> This is stuff you could live without in a novelty font. >> It's commonly missing. > > Here have to go the spanish and french letters and symbols plus euro > symbol plus some other uncommon punctuation, so this applies to > 'letters, punctuation and symbols', not only punctuation, right? > Also the same question as above, is enouth a subset of them? Use a subset, ensuring that it includes the least popular of things that are normally available/missing together. I didn't set a great example. "<" and ">" tend to go together, so really only one of those is needed. The same goes for "{" and "}". >> > ,.?! - Common punctuation. (In Spanish locales, for example, >> > you'd want the upside-down ? and ! ) >> >> This is really critical for using the font. >> > So in catalan I have to add "·" middot, "'" apostroph and "-" (minus?) > as they are really needed. That seems about right. >> > 1Il| - Distinct line-like characters. (Ditto) >> >> This is to prefer fonts with distinct characters. It's confusing >> if you can't tell the difference. It's not so easy to explain why >> the computer has all these symbols if they all look the same. >> In general, indistinct characters is a sign of a poor font. >> > 1Il| Should I add here "i¡" (lower aye and exclamup)? If they are ever identical to each other, probably so. Otherwise, no. Testing fonts eats CPU time. |
From: Bill K. <nb...@so...> - 2009-06-02 23:19:54
|
On Wed, Jun 03, 2009 at 12:13:54AM +0200, Pere Pujal i Carabantes wrote: > In catalan we need the ç (ccedill) and some accented letters. > Should I should put all of them here or a selected subset will be > enouth? I believe if you put all of the ones necessary for Catalan, that will help fonts that FULLY support Catalan to bubble up to the top of the list. Those that PARTIALLY support it would be lower on the list. > Also, we should be able to type spanish (ntilde, exclamup...) and > french (æ ae, oe) letters, as they are not so important for writing > catalan, they have to go to uncommon punctuation below, right? That sounds right. Albert? Also, I'm waiting to hear from Albert as to whether providing weights for some of these strings make sense. (e.g., more 'points' for supporting common punctuation than for supporting uncommon ones... imagine, for example, a font that supports Catalan characters (but not French or Spanish), and one that supports Spanish and French (but is missing Catalan). I think right now, they'd get the same score for those two tests. An unlikely example, but still...) > > > oO - Test whether uppercase and lowercase characters work > > > (it's ok if it does not, but is scored lower). > > > > > I am lost here, if we have yet checked for QX (uppercase) and qx > (lowercase), what we are suposed to put here? > > Oh wait, I see below in your comment about Sweddish. The "QX"/"qx" and "QY"/"qy" seem to be tested simultaneously. I don't quite understand the logic, but code-wise, it works like this: if (... ((charset_works(font, gettext("qx")) && // (qx and charset_works(font, gettext("QX"))) // QX) || // or (charset_works(font, gettext("qy")) && // (qy and charset_works(font, gettext("QY"))) )) // QY) So I guess this means, if all four strings are translated, then the font had better support everything you had translated it to. (charset_works() is our own function, written by Albert, that checks whether every character in the string gets blitted. Unfortunately, in the case of Tibetan, the font didn't include A-Z characters, it included Tibetan glyphs at those spots in the font. :^( Bad standard.) Otherwise, one or the other needs to work. Though I'm not sure if there's really a difference between "only translate Line X ("qx"/"QX")", which we instruct people to do now, and "only translate one line, Line X or Line Y". (Again, this is the case where we need to filter out a font that does not support our language at all, and we do not grok ASCII.) *boggle* Confusing. :) <snip> > Here have to go the spanish and french letters and symbols plus euro > symbol plus some other uncommon punctuation, so this applies to > 'letters, punctuation and symbols', not only punctuation, right? > Also the same question as above, is enouth a subset of them? Depends on if a subset is sufficient for typing in your locale. ;) (And once again, we should decide whether weighing the score for this lower than the following (below) is a good idea.) <snip> > So in catalan I have to add "·" middot, "'" apostroph and "-" (minus?) > as they are really needed. Sure! Do it! :) <snip> > 1Il| Should I add here "i¡" (lower aye and exclamup)? Albert? (How, exactly, is this test done. Do we compare the blitted characters somehow!? In other words, I _mostly_ understand why, but I'm unclear if how we do the test makes sense... I haven't dissected 'charset_works()') <snip> > I've seen some fonts that have the same gliph for both accented and not > accented letters. Also, I've seen many fonts that have different gliphs > for lowercase accented/not accented letters, but don't diferentiate > capital accented letters from its corresponding capital not accented > one, so maybe the Swedish choice is not so bad. Albert, is it possible for us to even notice these problems, though? (i.e., I don't think we compare "i" to "i acute", and see that they look identical, and therefore score the font lower. Which leads me to the previous question, what do we _do_ with "1Il|", other than check that the characters exist in the font?) -- -bill! Sent from my computer |
From: Albert C. <aca...@gm...> - 2009-06-03 08:20:27
|
On Tue, Jun 2, 2009 at 7:19 PM, Bill Kendrick <nb...@so...> wrote: > On Wed, Jun 03, 2009 at 12:13:54AM +0200, Pere Pujal i Carabantes wrote: > Also, I'm waiting to hear from Albert as to whether providing weights > for some of these strings make sense. (e.g., more 'points' for supporting > common punctuation than for supporting uncommon ones... imagine, for example, > a font that supports Catalan characters (but not French or Spanish), and > one that supports Spanish and French (but is missing Catalan). I think > right now, they'd get the same score for those two tests. An unlikely > example, but still...) The scores already add together. I've yet to see a font that had all the uncommon puctuation but was missing some of the critical punctuation. > The "QX"/"qx" and "QY"/"qy" seem to be tested simultaneously. I don't > quite understand the logic, but code-wise, it works like this: > > if (... ((charset_works(font, gettext("qx")) && // (qx and > charset_works(font, gettext("QX"))) // QX) > || // or > (charset_works(font, gettext("qy")) && // (qy and > charset_works(font, gettext("QY"))) )) // QY) > > So I guess this means, if all four strings are translated, then the font > had better support everything you had translated it to. Yep. > (charset_works() is our own function, written by Albert, that checks > whether every character in the string gets blitted. It detects: 1. total refusal to render 2. any pair of characters rendering the same way > Otherwise, one or the other needs to work. Though I'm not sure if there's > really a difference between "only translate Line X ("qx"/"QX")", which we > instruct people to do now, and "only translate one line, Line X or Line Y". > > (Again, this is the case where we need to filter out a font that does not > support our language at all, and we do not grok ASCII.) > > *boggle* Confusing. :) Yeah, that wasn't well done. Probably just Line X should exist, and translators should append or replace it as appropriate. > (How, exactly, is this test done. Do we compare the blitted characters > somehow!? In other words, I _mostly_ understand why, but I'm unclear if > how we do the test makes sense... I haven't dissected 'charset_works()') Yes, ultimately with memcmp. >> I've seen some fonts that have the same gliph for both accented and not >> accented letters. Also, I've seen many fonts that have different gliphs >> for lowercase accented/not accented letters, but don't diferentiate >> capital accented letters from its corresponding capital not accented >> one, so maybe the Swedish choice is not so bad. > > Albert, is it possible for us to even notice these problems, though? Yes. > (i.e., I don't think we compare "i" to "i acute", and see that they > look identical, and therefore score the font lower. Translators need to be doing this. If they need another string to translate, then let's add it to the code. > Which leads me to > the previous question, what do we _do_ with "1Il|", other than check that > the characters exist in the font?) If any of those characters render identically, then the score does not get incremented for that test. |
From: Karl O. H. <ka...@hu...> - 2009-07-07 19:27:56
|
Tysdag 2. juni 2009 skreiv Albert Cahalan: >> grep -C 2 "qy" po/*.po | grep msgstr | grep -v "qy" | grep -v \"\" >> grep -C 2 "QY" po/*.po | grep msgstr | grep -v "QY" | grep -v \"\" >> # norwegian locales translate this -- not sure if that's >> appropriate...? > >I think it's an error, because plain ASCII is slightly useful. >Norwegian probably should translate one pair of these only. No, plain ASCII is *not* useful for Norwegian. That is, it is about as useful as a font missing the letter E is for English. -- Karl Ove Hufthammer http://huftis.org/ Jabber: ka...@hu... |