From: Manabu T. <ter...@ca...> - 2012-01-20 07:13:35
|
Hi, I'm sorry for late response. >> That's not a problem if 'œuf' is stored as 'oeuf' in lexicon, or 'économie' as 'economie'... >> but i don't feel very comfortable with the fact that, for example, "テス" will be stored stored as "30c630b9", even if i have not seen any border effect on plone behaviour related to this, >> (especially considering that this normalization doesn't have any added value for those languages) I think big problem for eastern languages (China, Korean and Japanese). And, It is maybe problem for Arabic languages. > I believe the bigram splitter activates itself for certain character ranges, so perhaps a solution along those lines would work? > > If not, making this a configurable option like some of the other text index options (lexicon etc.) makes sense. > > David +1 This system should use character ranges for to need languages. -- =========================== Manabu TERADA (@terapyon) ter...@ca... =========================== On 2012/01/18, at 7:10, David Glick (GW) wrote: > > On Jan 17, 2012, at 2:06 PM, thomas desvenain wrote: > >> Hi, >> >> I have implemented the PLIP, but I have a doubt. >> >> As far as i know, my implementation works, is tested (i have tests in english, french and japanese for now), is backward compatible. >> But I don't like that the values stored in plone_lexicon are not human-readable anymore for languages where translation into ascii is not obvious (eastern languages) >> >> That's not a problem if 'œuf' is stored as 'oeuf' in lexicon, or 'économie' as 'economie'... >> but i don't feel very comfortable with the fact that, for example, "テス" will be stored stored as "30c630b9", even if i have not seen any border effect on plone behaviour related to this, >> (especially considering that this normalization doesn't have any added value for those languages) >> >> I don't know how to check, according to site language, if normalization is relevant or not. >> Anyway, testing in splitter which is the current language is not reasonable, and API doesn't allow us to pass this as an argument. >> >> So i wonder if it wouldn't be better to make use of I18NNormalizer as an option through an optional profile for our plone site ? >> > > I believe the bigram splitter activates itself for certain character ranges, so perhaps a solution along those lines would work? > > If not, making this a configurable option like some of the other text index options (lexicon etc.) makes sense. > > David > > > ---------- > David Glick > Web Developer > dav...@gr... > 206.286.1235x32 > > Groundwire Consulting is here. > > http://groundwire.org/about/FAQ-gw-consulting > > |