Re: [Plone-i18n] [Plone-developers] PLIP suggestion : accents normalization in plone lexicon

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

I'm sorry for late response.

>> That's not a problem if 'œuf' is stored as 'oeuf' in lexicon, or 'économie' as 'economie'...
>> but i don't feel very comfortable with the fact that, for example, "テス" will be stored stored as "30c630b9", even if i have not seen any border effect on plone behaviour related to this,
>> (especially considering that this normalization doesn't have any added value for those languages)

I think big problem for eastern languages (China, Korean and Japanese). And, It is maybe problem for Arabic languages. 

> I believe the bigram splitter activates itself for certain character ranges, so perhaps a solution along those lines would work?
> 
> If not, making this a configurable option like some of the other text index options (lexicon etc.) makes sense.
> 
> David
+1

This system should use character ranges for to need languages.
-- 
===========================
Manabu TERADA (@terapyon)
ter...@ca...
===========================

On 2012/01/18, at 7:10, David Glick (GW) wrote:

> 
> On Jan 17, 2012, at 2:06 PM, thomas desvenain wrote:
> 
>> Hi,
>> 
>> I have implemented the PLIP, but I have a doubt.
>> 
>> As far as i know, my implementation works, is tested (i have tests in english, french and japanese for now), is backward compatible.
>> But I don't like that the values stored in plone_lexicon are not human-readable anymore for languages where translation into ascii is not obvious (eastern languages)
>> 
>> That's not a problem if 'œuf' is stored as 'oeuf' in lexicon, or 'économie' as 'economie'...
>> but i don't feel very comfortable with the fact that, for example, "テス" will be stored stored as "30c630b9", even if i have not seen any border effect on plone behaviour related to this,
>> (especially considering that this normalization doesn't have any added value for those languages)
>> 
>> I don't know how to check, according to site language, if normalization is relevant or not.
>> Anyway, testing in splitter which is the current language is not reasonable, and API doesn't allow us to pass this as an argument.
>> 
>> So i wonder if it wouldn't be better to make use of I18NNormalizer as an option through an optional profile for our plone site ?
>> 
> 
> I believe the bigram splitter activates itself for certain character ranges, so perhaps a solution along those lines would work?
> 
> If not, making this a configurable option like some of the other text index options (lexicon etc.) makes sense.
> 
> David
> 
> 
> ----------
> David Glick
> Web Developer
> dav...@gr...
> 206.286.1235x32
> 
> Groundwire Consulting is here.
> 
> http://groundwire.org/about/FAQ-gw-consulting
> 
>