From: Linas V. <shi...@gm...> - 2012-05-21 13:10:22
|
Hello, I've found out that the existing collation rules are, in fact, correct and that my previous assumptions about how lists should be ordered were wrong. So, things are fine as they are right now. You can use excerpts from LST (short for Lithuanian STandard) 1285 [1] and a document titled "Lietuvių kalbos dalykų informacijos technologijoje norminimas" [2] in later discussions, if any. Those are in Lithuanian, but the essential parts can still be understood fine. Thanks again, [1]: http://stuff.pypt.lt/ltcoll/lst-1285.pdf [2]: http://stuff.pypt.lt/ltcoll/lietuviu-kalbos-dalyku-informacijos-technologijoje-norminimas.pdf On Sat, May 19, 2012 at 11:12 PM, Linas Valiukas <shi...@gm...> wrote: > Hello, > > I'm using ICU with Lithuanian (lt_LT) language. The alphabet for this language is the following: a ą b c č d e ę ė <...> v z ž > > However, when sorting, ICU's collator assumes that, for example, "a" and "ą" ("a" with ogonek) are equivalent, so a list of Lithuanian words get sorted as this: > > a, ą, ab, aba, abadas, <...>, b, ba, <...> > > When the expected result would be: > > a, ab, aba, abadas, <...>, ą, <...>, b, ba, <...> > > The same happens with other "accented" letters ("e" - "ę" - "ė", "z" - "ž", etc.) > -- Linas Valiukas tel: +370 687 65870 skype: shirshegsm www: http://pypt.lt |