From: Jehan-Guillaume (i. de R. <io...@fr...> - 2010-12-31 15:30:59
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Le 31/12/2010 11:16, Alexey Baturin a écrit : > Hello Jehan, > > I forgot there were some recent changes to lang files, here is an updated patch. > > On Wed, Dec 29, 2010 at 11:38 PM, Jehan-Guillaume (ioguix) de Rorthais > <io...@fr...> wrote: >> About the comments stuff, it's not 100% clear in my mind. Before >> considering moving everything to UTF8, we need to double check that it >> will not actually break the text printed on the page. >> We should pay attention to the encoding of *all* layers: the lang file, >> the one send to the browser, the one send to the database, and the >> encoding of the database itself. > > Let's begin with small changes - language files =). Yeah, but all these layers are related :/ >>>>> 1. I think it's not so good to have two Russian translations - so I >>>>> replaced old translation in KOI8-R with a new one in UTF-8: >>>>> russian_utf8.patch.bz2 >> >> So I think we had both to be able to have a complete homogeneous >> encoding stack and avoid character breakage. > > What do you mean by "homogeneous encoding stack"? I've verified that > UTF-8 works well with Russian translation for some time. If you're not > sure it works well in some configuration - you could send me a > screenshot - I'll verify it. Ok, I need to have a clear view on what happen behind the scene. So here my understanding of what PPA does: #0 the encoding we send to the browser depends on $lang['appcharset']. See function in classes/Misc.php:376 #1 $lang['appcharset'] is set in all language file #2 we overwrite $lang['appcharset'] depending on the database encoding we are connected to. See block in libraries/lib.inc.php:226 As far as I understand it, #1 shouldn't be necessary. We are using the recoded files that are pure ASCII files. We recode files in lang/ in lang/recoded using he command "recode $encoding..xml". so resulting files are using the XML escape sequence based on character reference, ie. é for 'é'. So far, as we are using plain ASCII-7 and sending UTF-8 to browsers should be perfectly fine. #2 set the client encoding according to the database encoding. So PostgreSQL doesn't convert data. To be able to print data fetched from PostgreSQL, we overwrite $lang['appcharset'] using this database encoding and use it as HTML page encoding (see #0). According to this page, we can use UNICODE/UTF-8 as client_encoding for all database encoding but MULE_INTERNAL : http://www.postgresql.org/docs/7.4/static/multibyte.html But anyway, I don't think we support MULE_ENCODING correctly today. In conclusion, if I didn't forget something on the way, then yes, we could probably use UTF-8 everywhere, with some more investigation needed for MULE_ENCODING. So I guess it's fairly safe to create a first patch then test it as much as we can. Comments ? Warnings ? What did I forgot ? >>>>> 2. Do you remember a bad screenshot on the PPA site? I've checked >>>>> Ukrainian translation - it doesn't work as expected: you can compare >>>>> attached screenshots (bad symbols are in red circles). It seems that >>>>> the reason is KOI8-R instead of KOI8-U everywhere. I have not managed >>>>> to fix it, something goes wrong with Recode: >>>>> ===== >>>>> Recoding ukrainian... >>>>> recode: Untranslatable input in step `KOI8-U..ISO-10646-UCS-2' >>>>> ===== >>>>> However I managed to switch fast to UTF-8 with iconv - patch is attached. >> >> So we need to make a full check to be sure it doesn't break data pages I >> guess... >> Could we grant you as the ukranian and russian maintainer ? > > As a Russian - sure you can, as a Ukrainian - you can't, I don't know > Ukrainian well =). But I can see this obvious bug of Ukrainian > translation. Great :) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk0d9yIACgkQxWGfaAgowiLwvwCggbYb84mz0mQfWZ9Sp+nS2UH+ WpAAnjRTV2r1RYv1GY4cNo5aeQcaxFxa =pV9L -----END PGP SIGNATURE----- |