[Plone-developers] [Plone-i18n] PLIP suggestion : accents normalization in plone lexicon

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi All,

I also would like to see the feature available in out-of-the-box Plone 4.x

Note that the issue of not handling properly chars over the ASCII set
also affects splitting of words:
the default splitter breaks words appart when encountening chars over 255.

The biggest hassle to resolve this was to put together
the mappings of "similar" characters
I.e., all the following unicodes could match for each other (letter o):
     0x004F: [ 
0x006F,0x00D2,0x00D3,0x00D4,0x00D5,0x00D6,0x00F2,0x00F3,0x00F4,0x00F5,0x00F6,0x00D8,0x00F8,0x014C,0x014D,0x014E,0x014F,0x0150,0x0151,0x0152,0x0153],     
# O

So if anybody implementing this wants to save some time,
I already collected some character mappings.

I implemented an ISplitter implementation,
that I apply to application-specific instances of ZCatalog,
indexing elements not in English language
(although it should work also for english).

Note that this is running on Plone 2.5.

This allows to enter search criteria with characters of cyrilic, greek 
and latin languages,
and supports equivalences between letters plain and upper/lower case, 
accentuated/diacritical, or somehow similar.

It has been used inproduction and tested only with narrow unicode pyton 
builds (up to unicode 65535).

Available in the OSOR.eu repository forge, project gvSIG-i18n

http://forge.osor.eu/projects/gvsig-i18n/

TRASplitter.py 
<http://forge.osor.eu/plugins/scmsvn/viewcvs.php/*checkout*/trunk/gvSIGi18n/TRASplitter.py?content-type=text%2Fplain&root=gvsig-i18n>
TRAUnicode.py 
<http://forge.osor.eu/plugins/scmsvn/viewcvs.php/*checkout*/trunk/gvSIGi18n/TRAUnicode.py?content-type=text%2Fplain&rev=189&root=gvsig-i18n>
TRAUnicode_Constants.py 
<http://forge.osor.eu/plugins/scmsvn/viewcvs.php/*checkout*/trunk/gvSIGi18n/TRAUnicode_Constants.py?content-type=text%2Fplain&root=gvsig-i18n>

Antonio Carrasco Valero
car...@gm...

Model Driven Development, sl
Valencia España (Spain)
www.ModelDD.org

On 20:59, Lachlan Musicman wrote:
> On Mon, Jun 6, 2011 at 23:08, thomas desvenain
> <tho...@gm...>  wrote:
>> Hi,
>>
>> Most users want the search to ignore accents
>>
>> where "économétrie"
>> finds "econometrie", "Econométrie", "Économétrie".
> I can understand this, if not use it. My translation students are
> capable, but they have to work on the University's locked down
> systems. As we are in Australia, the keyboard is set to US English,
> with no accents etc. Like I said, they are capable enough to discover
> work arounds, but they are also looking for workflow pace - and being
> able to search accent free would be a boon to their productivity
>
> cheers
> L.
>
>> What do you think about a PLIP to give Plone lexicon a casenormalizer
>> that would use plone.i18n stuff to normalize ZCTextIndex lexical
>> values ?
>>
>> (as lucene latin normalizer does)
>>
>> that would be the occasion to fix a bug, that "économétrie" does'nt
>> find "Économétrie" (plone.i18n stuff manages that, plone lexicon, not)
>>
>> (I can write the PLIP and implement it... but i know it is a major issue)
>>
>> Thanks
>>
>> Thomas
>>
>> --
>> Thomas Desvenain
>>
>> Téléphone : 09 51 37 35 18
>>
>> ------------------------------------------------------------------------------
>> Simplify data backup and recovery for your virtual environment with vRanger.
>> Installation's a snap, and flexible recovery options mean your data is safe,
>> secure and there when you need it. Discover what all the cheering's about.
>> Get your free trial download today.
>> http://p.sf.net/sfu/quest-dev2dev2
>> _______________________________________________
>> Plone-i18n mailing list
>> Plo...@li...
>> https://lists.sourceforge.net/lists/listinfo/plone-i18n
>>
>
>