|
From: Mark D. <mar...@jt...> - 2005-05-27 14:17:00
|
With the first case, I believe what is happening is that one or more of t= he characters are whitespace, so what the Collator sees is < < which is an error. You probably have to double-slash them. For the second, we have a long-standing design (since 2002) for changing = the order of scripts ( see http://dev.icu-project.org/cgi-bin/viewcvs.cgi/*checkout*/icuhtml/design/= collation/ICU_collation_design.htm and search for "scriptOrder" and "Script Order"). But it has never been a priority among our tasks, so you have to do it the hard way, by copying t= he UCA rules for the script + whatever script tailorings there are. This can be done programmatically, I think. Eg to put tamil in front of latin, do something like the following. Let X be the UCA rules Let A be the position of the first character matching [[:script=3Dtamil:][:letter:]] Let B be the position of the first character after A matching [:^script=3Dtamil:] Let X1 be the substring of X from A upto but not including B Let Y be the Tamil rules. Let Z be "& [before 1] a < " + X1 + Y Create new collator from Z. For multiple scripts, just add more pairs to Z. This will probably need some tweeks beyond this, but you could start from here. =E2=80=8EMark ----- Original Message -----=20 From: "Rajeev J Sebastian" <raj...@di...> To: <icu...@li...> Sent: Friday, May 27, 2005 10:09 Subject: [icu-support] Collation question > Hi, > > I have a question regarding collator implementation for ml_IN locale in ICU > 2.6 > > I need to reassign the positions of u200d and u200c as shown below: > > > > > CollationElements { > Version { "1.0" } > Sequence { "[normalization on]&[top]" > "<\u200d<\u200c" > .... > > > However, this always results in the following error while building: > > > > ICU_DATA=3D../data/out/build > LD_LIBRARY_PATH=3D../common:../i18n:../tools/toolutil:../layout:../layout= ex:.. /extra/ustdio:../tools/ctestfw:../data/out:../data:../stubdata/: > $LD_LIBRARY_PATH ../tools/genrb/genrb -k -q -p icudt26l -s ../data/locales > -d ../data/out/build ml_IN.txt > ../data/locales/ml_IN.txt:108: parse error. Stopped parsing with > U_INVALID_FORMAT_ERROR > couldn't parse the file ml_IN.txt. Error:U_INVALID_FORMAT_ERROR > make[1]: *** [../data/out/build/icudt26l_ml_IN.res] Error 3 > make[1]: Leaving directory `/home/kane/font_maker/icu/icu/source/data' > make: *** [all-recursive] Error 2 > > > > > Trying the rule in Locale Explorer works. Perhaps it is because 200c/20= 0d are > already assigned as equal to [last tertiary ignorable] ? Is there any w= ay to > reassign weights of these two codepoints in my locale ? > > Also, how do I move sections around ? for e.g., I would like to include= in the > ml_IN locale, the following order: > > malayalam, tamil, latin, devanagari .... > > So, how can I move say the Malayalam section to just before Tamil and s= o on ? > Or do I have to copy the collation rules from the appropriate files int= o my > ml_IN locale ? > > > Rajeev J Sebastian > > > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by Yahoo. > Introducing Yahoo! Search Developer Network - Create apps using Yahoo! > Search APIs Find out how you can build Yahoo! directly into your own > Applications - visit http://developer.yahoo.net/?fr=3Doffad-ysdn-ostg-q= 22005 > _______________________________________________ > icu-support mailing list - icu...@li... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-suppo= rt > > |