Re: [Plone-developers] [Plone-i18n] PLIP suggestion : accents normalization in plone lexicon

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Sun, Jul 31, 2011 at 5:20 AM, Antonio Carrasco Valero on gmail
<car...@gm...> wrote:
> The biggest hassle is to put together the mappings of "similar" characters
> I.e., all the following unicodes could match for each other:

Not really. We already have and ship such a list and use it as part of
plone.i18n. It depends on the http://pypi.python.org/pypi/Unidecode
library, which has a pretty comprehensive list and maps about 46000
characters from the entire Unicode range.

So all we need to do is:

from plone.i18n.normalizer.base import baseNormalize
ascii = baseNormalize('some text')

The baseNormalize function only uses the Unidecode mappings with some
upper limit - as the phonetic mappings for Asian languages aren't good
enough. This makes sense for this use-case as well, as Asian languages
need different approaches for search anyways, like not doing
whitespace delimited splitting.

But thanks for the pointer :)

Hanno