From: Steven R. L. <sr...@ic...> - 2012-05-19 20:59:58
|
ICU collation tailorings come from CLDR, where a and ą are a secondary difference. http://cldr.unicode.org -s On 05/19/2012 01:15 PM, Linas Valiukas wrote: > Hello, > > I'm using ICU with Lithuanian (lt_LT) language. The alphabet for this > language is the following: a ą b c č d e ę ė<...> v z ž > > However, when sorting, ICU's collator assumes that, for example, "a" > and "ą" ("a" with ogonek) are equivalent, so a list of Lithuanian > words get sorted as this: > > a, ą, ab, aba, abadas,<...>, b, ba,<...> > > When the expected result would be: > > a, ab, aba, abadas,<...>, ą,<...>, b, ba,<...> > > The same happens with other "accented" letters ("e" - "ę" - "ė", "z" - > "ž", etc.) > > More specific test case: running "source/samples/coll/coll -locale > lt_LT -source ą -target aa" decides that "source is less than target" > when it's not the case. > > Is this behaviour expected? Is this a bug or a feature? ;-) If so, how > can I prevent ICU's collator from aligning "similar" letters together? > > Thanks, > |