Re: [Indic-computing-devel] Script specific features
Status: Alpha
Brought to you by:
jkoshy
From: Dr. U.B. P. <pav...@vi...> - 2002-03-06 08:32:32
|
From: Rajkumar S <s_...@my...> > > Devanagari has "Akhand", > > "Akhand" feature is confusing for me, I am having a dialogue with > various people here (I am a native Malayalam speaker) and Apurva > about this. So far I have never come across this feature in Malayalam > grammar, but it seems that the two Akhand ligatures get priority over > other ligatures when rendering. The way I tested this by asking this > question > > Given a font that has a glyph each for the conjuncts: KaKa, Kssa. > Given that the font contains a lookup with the following substitution > rule: > > Ka Halant Ka -> KaKa > Ka Halant Ssa -> Kssa > Ja Halant Nya -> Dnya > Nya Halant Nya -> Nnya > > Now given a theoretical sequences: Ka Halant Ka Halant Ssa Halant Ma > and Ja Halant Nya Halant Nya Halant Ma, How will you render them. > > All of them answered that they will give priority for Akhand. But > later when I explained the concept of Akhand they were surprised. But > even now I don't know if their is any linguistic basis for clustering > priorities. In OTL services Akhand follows Nukta and all other features are applied later. Hence Akhand gets priority. "Is there any linguistic basis for this?" -I am afraid of existence of any positive answer for this question. Devanagari has akhand feature for KSssa and JaNya. MS has implemented akhand for for these in their Tunga font for Kannada. Actually, there is no akhand feature in Kannada. > Any one has any idea about sorting Unicode Indic data, esp in the > context of any database? Any Unicode aware DB out there? Sorting in Unicode follows the collation table mentioned in Unicode Technical Report 10 (tr10). This is based on ISCII. ISCII has wrongly placed La and Lla together for Kannada (and all other languages). Hindi does not contain Lla. Hence the ISCII order is Ya, Ra, Rra (old), La, Lla, Va, Sha, Ssa, Sa, Ha; while the correct order for Kannada is Ya, Ra, Rra (old), La, Va, Sha, Ssa, Sa, Ha, Lla. This mistake in the Unicode has been corrected now (March 01). But MS is yet to implement it in XP. Other language people should check out these. MS Access that ships with WinXP and OfficeXP has Uniocde sorting for Indic. Same is true of SQL2000 from MS. -Pavanaja > raj ----------------------------------------------------- Dr. U.B. Pavanaja Editor, Vishva Kannada World's first Internet magazine in Kannada http://www.vishvakannada.com/ Note: I don't worry about pselling mixtakes |