Re: [icu-support] Script Reordering (ICU4C)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Fri, 31 Aug 2012 10:14:36 -0700
Mark Davis ☕ <ma...@ma...> wrote:

> On Fri, Aug 31, 2012 at 1:18 AM,
> <Chr...@pa...>wrote:

> > Now I have two bonus questions :)
> > 1) Is it possible to do a mixed sort of Latin and Han characters,
> > so that the Han characters appear at the position corresponding to
> > their pinyin spelling?

> The pinyin sort does exactly this.

Not quite.  It compares Chinese characters according to their
transliteration, but for Latin v. Han it remains Latin first.

You'd have to convert CLDR pinyin collation lines such as

<pc>欻歘</pc><!-- chuā -->
<pc>揣搋</pc><!-- chuāi -->

to something like

&chuā <<< 欻 <<< 歘
&chuāi <<< 揣 <<< 搋

(Using a primary difference would yield chuā < chuāi < 揣 < 搋 < 欻 <
歘 < d, and the concept would become nonsensical when the pinyin had
only a secondary difference.)

I'm assuming ICU can handle thousands of collation elements only having
a tertiary difference from 'c'.

Using a secondary differences with French accents on would probably
also work.  Whether that is what you want for longer text is another
matter.

Richard.

Re: [icu-support] Script Reordering (ICU4C)

Open Source C/C++/Java libraries from Unicode

Re: [icu-support] Script Reordering (ICU4C)