From: Richard W. <ric...@nt...> - 2012-08-31 19:01:54
|
On Fri, 31 Aug 2012 10:14:36 -0700 Mark Davis ☕ <ma...@ma...> wrote: > On Fri, Aug 31, 2012 at 1:18 AM, > <Chr...@pa...>wrote: > > Now I have two bonus questions :) > > 1) Is it possible to do a mixed sort of Latin and Han characters, > > so that the Han characters appear at the position corresponding to > > their pinyin spelling? > The pinyin sort does exactly this. Not quite. It compares Chinese characters according to their transliteration, but for Latin v. Han it remains Latin first. You'd have to convert CLDR pinyin collation lines such as <pc>欻歘</pc><!-- chuā --> <pc>揣搋</pc><!-- chuāi --> to something like &chuā <<< 欻 <<< 歘 &chuāi <<< 揣 <<< 搋 (Using a primary difference would yield chuā < chuāi < 揣 < 搋 < 欻 < 歘 < d, and the concept would become nonsensical when the pinyin had only a secondary difference.) I'm assuming ICU can handle thousands of collation elements only having a tertiary difference from 'c'. Using a secondary differences with French accents on would probably also work. Whether that is what you want for longer text is another matter. Richard. |