[Plone-i18n] Re: Fix for UnicodeError: ASCII decoding error: ordinal not in range(128)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

Thanks Florent for the report.

I'm not willing to "fix" something just to break something else.
I prefer to go deeper in the solution you explain at the end,
"A final digression about ZPT", even if ZPT does not it by itself
maybe Localizer could with a dynamic patch.

Unfortunately I don't have time right now to research it by myself.
This means that either you or somebody else find a solution that
works in both situations, either you wait for me to find time to
find a solution.

Just for the record I attach a zexp file. It is a folder with two
page templates (case1 and case2), case1 one works when the variable
LOCALIZER_USE_ZOPE_UNICODE is set, case2 works when it isn't. To
test it use Localizer 1.0 and TranslationService 0.2. I want
something that works for both cases, nothing else is an option.

Best regards,
david

Florent Guillaume wrote:

>Hi Folks,
>
>Sorry for the crosspost but this really covers ZPT and Localizer, and is
>of great interest to the Plone i18n users. Please keep your answers to
>the lists where they are legitimate -- and I'd appreciate being kept as
>Cc.
>
>
>Ok, I got down to the reason for the infamous "UnicodeError: ASCII
>decoding error: ordinal not in range(128)". Thanks to all who cooperated
>in that matter.
>
>Readers wanting the quick solution without the rest of the discussion
>can skip to the part bracketed by #######.
>
>First a reminder of the problem for those not familiar with it.
>
>In many situations, in a multilingual Plone site using Localizer, people
>got the above error.
>
>This in fact happened in the following circumstances:
>
>- A page template like:
>        <h1 i18n:translate="edit_type_header">
>        Edit an object of type
>          <span i18n:name="type">
>            <span i18n:translate=""         
>                  tal:content="python:here.getTypeInfo().Title()" 
>                  tal:omit-tag="">Type</span>
>            </span> 
>        </h1>
>
>- A translation for type_header of the form
>        Éditer un objet de type ${type}
>  where the translation contains non-ascii characters ("É" here),
>
>- A substituted string for ${type} that itself has non-ascii characters,
>  for instance "déjà".
>
>What happens behind the scene during the template evaluation is complex,
>but at some point the <span i18n:translate> gets evaluated, the message
>catalog gets consulted and a u'déjà', as Unicode, is returned.
>
>At that point Localizer has a mechanism to convert all non-Unicode
>strings to their final browser encoding, in a plain string of bytes,
>so for instance using UTF-8 it would substitue 'd\xc3\xa9j\xc3\xa0'.
>
>The problem here is that this string is not destined to go to the
>browser yet, but will first be used further in the ZPT processing to be
>substituted for ${type}. So later in the processing, we have to
>substitute
>     u'Éditer un objet de type ${type}'
>using the mapping
>     {u'type': 'd\xc3\xa9j\xc3\xa0'}
>
>At that point, we have a mix of Unicode (which is legitimate) and some
>plain string encoded in the final output. This encoding came too soon!
>We would still like to have Unicode here... If we still had it it would
>work.
>
>Fortunately, I kind of foresaw this sort of problem a few months ago,
>and I included in Localizer a way to turn off its early conversion to
>browser output encoding.
>
>#######
>
>To do that, you have to launch Zope with the LOCALIZER_USE_ZOPE_UNICODE
>environment variable set to something not empty, for instance "yes".
>
>#######
>
>Now, why did Localizer choose to do early encoding by default? The
>problem is the following: during ZPT parsing, we're building something
>from the concatenation of a list of strings, some which are Unicode if
>they come from a message catalog (or some TALES returning Unicode), some
>which are plain strings like most of the page template itself.
>
>If all the plain strings are only ever pure ASCII, then there's no
>problem doing a join of all of them with something Unicode, and the
>result will be Unicode. That's what pure Zope 2.6 does by default. It
>then, in ZPublisher, proceeds to encode that resulting Unicode string in
>the preferred browser encoding and sends that. This mode is what you get
>if you define LOCALIZER_USE_ZOPE_UNICODE.
>
>But when Localizer was introduced, it was to be used by people who had
>localized their page templates by hand and thus included a lot of
>non-ASCII characters in them, in their preferred encoding, say, UTF-8,
>together with a RESPONSE.setHeader('Content-Type') with that encoding.
>So because of those non-ASCII characters, the strategy of the previous
>paragraph wouldn't work. So Localizer decided to encode all Unicode
>strings to the preferred encoding (assumed to be the same as the browser
>encoding) as soon as it saw them inside the ZPT parsing.
>
>Unfortunately, as we saw at the beginning, this can't work in the
>presence of i18n:name substitutions.
>
>As a conclusion, I recommend that Localizer use the standard Zope
>behavior by default, and only enable its early conversion when some new
>environment variable, for instance LOCALIZER_UNICODE_CONVERSION, is set.
>This will only be useful to people who have half-translated their site
>(some Unicode from the message catalog, and still some non-ASCII in the
>templates).
>
>
>
>A final digression about ZPT:
>
>I think the correct way to build the result of a ZPT would be to build a
>Unicode strings as soon as TALIntepreter detects a non-ASCII string. It
>would then encode the non-ASCII to Unicode using some kind of site- or
>page-default encoding. This would avoid most of our problems, and would
>anyway be more robust. It would simply mean replacing StringIO's
>(actually FasterStringIO's) getvalue method with an intelligent join
>that does the conversion I just outlined if needed.
>
>There remains the problem of deciding which is the default encoding to
>use...
>
>
>
>Thanks for any comments (and please watch where you send them!).
>
>
>Florent
>
>
>  
>

-- 
J. David Ibáñez, http://www.j-david.net
Software Engineer / Ingénieur Logiciel / Ingeniero de Software