Re: [icu-design] ICU API Proposal: AlphabeticIndex constructor from a Collator

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Apologies, we need to retract the modifications --

On Wed, Jan 30, 2013 at 10:09 AM, Markus Scherer <mar...@gm...>wrote:

> The ICU team has approved this API *with modifications*:
>
> *
>
>    - Construct from a Collator, not a RuleBasedCollator.
>    - Java getCollator() casts, may throw an exception.
>    - C++ getCollator() returns some other object if the input is not an
>    RBC.
>
> *
>

It turns out that we cannot use a Collator and also support an enhancement
that I proposed to CLDR and implemented this week: LDML has so far not
defined how to handle index characters with multiple primary weights, such
as "Æ" or "Sch". "Sch" and "St" are very common in German phone and address
books, and we have long wanted to support them.

Well, in order to find out if an index character like "Æ" or "Sch" has
multiple primary weights I use a CollationElementIterator, which is not
available from the base Collator, and I believe there is no other way to
look at primary weights. Except possibly internal API, but that ties the
AlphabeticIndex even more to a specific Collator implementation.

The only other way out would be to detect if the collator is an RBC, and if
not, then disable the handling of "Æ" or "Sch" etc. That would be a
non-obvious variation in behavior, which seems bad.

I think we should go back to the original proposal and take a
RuleBasedCollator. This also simplifies the C++ code which would have to do
something interesting in getCollator() when the collator is not an RBC (it
returns a const reference).

markus

Re: [icu-design] ICU API Proposal: AlphabeticIndex constructor from a Collator

Open Source C/C++/Java libraries from Unicode

Re: [icu-design] ICU API Proposal: AlphabeticIndex constructor from a Collator