Re: [icu-support] Collator assumes that "a" and "ą" are the same (lt_LT)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello,

I've found out that the existing collation rules are, in fact, correct
and that my previous assumptions about how lists should be ordered
were wrong.

So, things are fine as they are right now.

You can use excerpts from LST (short for Lithuanian STandard) 1285 [1]
and a document titled "Lietuvių kalbos dalykų informacijos
technologijoje norminimas" [2] in later discussions, if any. Those are
in Lithuanian, but the essential parts can still be understood fine.

Thanks again,

[1]: http://stuff.pypt.lt/ltcoll/lst-1285.pdf
[2]: http://stuff.pypt.lt/ltcoll/lietuviu-kalbos-dalyku-informacijos-technologijoje-norminimas.pdf

On Sat, May 19, 2012 at 11:12 PM, Linas Valiukas <shi...@gm...> wrote:
> Hello,
>
> I'm using ICU with Lithuanian (lt_LT) language. The alphabet for this language is the following: a ą b c č d e ę ė <...> v z ž
>
> However, when sorting, ICU's collator assumes that, for example, "a" and "ą" ("a" with ogonek) are equivalent, so a list of Lithuanian words get sorted as this:
>
> a, ą, ab, aba, abadas, <...>, b, ba, <...>
>
> When the expected result would be:
>
> a, ab, aba, abadas, <...>, ą, <...>, b, ba, <...>
>
> The same happens with other "accented" letters ("e" - "ę" - "ė", "z" - "ž", etc.)
>

-- 
Linas Valiukas
tel: +370 687 65870
skype: shirshegsm
www: http://pypt.lt

Re: [icu-support] Collator assumes that "a" and "ą" are the same (lt_LT)

Open Source C/C++/Java libraries from Unicode

Re: [icu-support] Collator assumes that "a" and "ą" are the same (lt_LT)