From: mehran m. <meh...@gm...> - 2006-09-25 05:56:00
|
Four levels of comparison is just one of the problems. By the way, would you please explain how can I assign more than three levels of comparison to a character with ICU's 32-bit collation elements? I need to assign a collation element like <x,y,0,1> to U+0627 character, for instance. Is it possible to extend RuleBasedCollator syntax to support these kinds of weight assignments? Mehran On 9/25/06, weiv <wei...@gm...> wrote: > > Merhan, > it would probably be most beneficial if you could try to define the > Persian collation. ICU does support collations that use more than three > levels, such as Japanese JIS 4061X. Please provide more information on your > requirements. > > Regards, > v. > > On Sep 24, 2006, at 2:07 AM, mehran mehr wrote: > > Persian Collation needs three features available in glibc collation > implementation: > > 1) At least four level of comparison (five is even better) > 2) A method to ignore base characters at third-level of comparison (this > is supported in glibc using 'IGNORE' keyword, which assigns a zero weight, > strictly) > 3) A method to ignore base characters at fourth-level of comparison (this > feature is supported in glibc using 'position' keyword) > > but ICU Collation implementation doesn't support these features: > > 1) ICU has just three independent Collation Levels. A 32-bit Collation > Element (2 byte for first level, 1 byte for second level, and 1 byte for > third level). > the Case level and Tiebreaking level and even the Hiragana Case level are > dependent to the character code-point and are not usable for other purposes. > And as far as I know, shifted comparison for variables doesn't work in > ICU. > Is it possible to add more independent comparison levels to ICU in near > future? > > 2) As stated in UTS#10: > ---------------------------excerpts from > UTS#10------------------------------------- > 3.3 Well-Formed Collation Element Tables > > A well-formed Collation Element Table meets the following conditions: > Except in special cases detailed in * Section 6.2, Large Weight Values*, > no collation element can have a zero weight at Level N and a non-zero weight > at Level N-1. > > ---------------------------------------------------------------------------------------------- > glibc collation tables may not be well-formed by this definition. But we > actually need this feature in Persian Collation. > Is it possible for ICU to support these kind of zero weights? > > 3) It is possible to implement positioning by assigning a sufficiently > small weight to the base characters (for example 1) at fourth level. But > strict weight assigning > is not supported in RuleBasedCollation class. > Is it possible to extend this class in order to support strict weight > assigning? > > Mehran > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys -- and earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV_______________________________________________ > icu-design mailing list > icu...@li... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-design > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > _______________________________________________ > icu-design mailing list > icu...@li... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-design > > |