|
From: Markus S. <mar...@gm...> - 2013-01-31 21:41:30
|
Apologies, we need to retract the modifications -- On Wed, Jan 30, 2013 at 10:09 AM, Markus Scherer <mar...@gm...>wrote: > The ICU team has approved this API *with modifications*: > > * > > - Construct from a Collator, not a RuleBasedCollator. > - Java getCollator() casts, may throw an exception. > - C++ getCollator() returns some other object if the input is not an > RBC. > > * > It turns out that we cannot use a Collator and also support an enhancement that I proposed to CLDR and implemented this week: LDML has so far not defined how to handle index characters with multiple primary weights, such as "Æ" or "Sch". "Sch" and "St" are very common in German phone and address books, and we have long wanted to support them. Well, in order to find out if an index character like "Æ" or "Sch" has multiple primary weights I use a CollationElementIterator, which is not available from the base Collator, and I believe there is no other way to look at primary weights. Except possibly internal API, but that ties the AlphabeticIndex even more to a specific Collator implementation. The only other way out would be to detect if the collator is an RBC, and if not, then disable the handling of "Æ" or "Sch" etc. That would be a non-obvious variation in behavior, which seems bad. I think we should go back to the original proposal and take a RuleBasedCollator. This also simplifies the C++ code which would have to do something interesting in getCollator() when the collator is not an RBC (it returns a const reference). markus |