Re: 'Primary normal form'? (pattern matching problem)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Geert-Jan,

The next ICU release will include new APIs for searching patterns within
unicode strings.
The APIs makes use of collation to perform the matches. Hence you can tailor
the rules and set the strength for matching.
For details, please refer to the link below.
http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/searchproposal
.html

Syn Wee Quek
IBM GCoC, Cupertino, CA, USA

----- Original Message -----
From: "Geert-Jan van Opdorp" <op...@pi...>
To: <ic...@os...>
Sent: Tuesday, July 31, 2001 7:45 AM
Subject: 'Primary normal form'? (pattern matching problem)

>
> Hi,
>
> I'm working on a pattern matcher that should be able to match using
> primary collation. One problem I encounter is that (e.g in the en_us
> locale) 'ss' should match U+00DF LATIN SMALL LETTER SHARP S. This
> means that either one one-character-wildcard should match 'ss' or that
> two one-character-wildcards should match U+00DF LATIN SMALL LETTER
> SHARP S. Now I dont mind having to tell my users that they need two
> wildcards to find U+00DF LATIN SMALL LETTER SHARP S, but I do want one
> and the same searchpattern to find 'ss' and U+00DF LATIN SMALL LETTER
> SHARP S.
>
> So it seems I need a collation-mode aware character-iterator, or,
> better yet, some kind of normalized primary form, i.e. something
> equivalent to the primary part of the collation key, but with
> recognizable characterboundaries.
>
> It seems to me this must be a common problem - probably I am missing
> the obvious somewhere. Any hints as how to solve this problem are very
> welcome.
>
> Thanks
> Geert-Jan
>
> Geert-Jan van Opdorp
> op...@pi...
>
> _______________________________________________
> icu mailing list
> ic...@os...
> http://oss.software.ibm.com/developerworks/opensource/mailman/listinfo/icu
>

Re: 'Primary normal form'? (pattern matching problem)

Open Source C/C++/Java libraries from Unicode

Re: 'Primary normal form'? (pattern matching problem)