Re: [Indic-computing-devel] Script specific features
Status: Alpha
Brought to you by:
jkoshy
From: Keyur S. <key...@ya...> - 2002-02-27 11:00:09
|
--- Rajkumar S <s_...@my...> wrote: > On Mon, 25 Feb 2002, Keyur Shroff wrote: > > > For example, Malayalam has "Chillaksharam" > My first impression with > "Chillaksharam" was that It is a unique feature of > Malayalam and it will > require some modifications in Unicode stds to accommodate > them. Our Ministry has prepared new proposal for introduction of new Indic characters in next version of Unicode standard. For that the Ministry is gathering information on various scripts from various Language Resource Centres in India for those scripts. Tomorrow someone from Kerala (don't remember his name) called me up at NCST and asked about this "Chillaksharam" problem. It is true that "Chillaksharam" form can be produced using Zero-Width-Joiner (ZWJ) and Zero-Width-NonJoiner (ZWNJ). > > Devanagari has "Akhand", > > "Akhand" feature is confusing for me, I am having a > dialogue with various > people here (I am a native Malayalam speaker) and Apurva > about this. So > far I have never come across this feature in Malayalam > grammar, but it > seems that the two Akhand ligatures get priority over > other ligatures when > rendering. The way I tested this by asking this question > > Given a font that has a glyph each for the conjuncts: > KaKa, Kssa. Given > that the font contains a lookup with the following > substitution rule: > > Ka Halant Ka -> KaKa > Ka Halant Ssa -> Kssa > Ja Halant Nya -> Dnya > Nya Halant Nya -> Nnya > > Now given a theoretical sequences: Ka Halant Ka Halant > Ssa Halant Ma and > Ja Halant Nya Halant Nya Halant Ma, How will you render > them. > > All of them answered that they will give priority for > Akhand. But later > when I explained the concept of Akhand they were > surprised. But even now > I don't know if their is any linguistic basis for > clustering priorities. I also discussed with Apurva Joshi @ Microsoft about applying features. She is daughter of Prof. R.K.Joshi who is working here at NCST in font design area. Raghu font has been designed by Prof. R.K.Joshi. Here I am quoting Apurva's message <quote> All the akhands I have come across so far in Indic scripts are made up of two consonants. They are thus essentially treated as conjuncts, not consonants. And, more importantly, they have an additional status of being processed first in any given input sequence. </quote> Thus, "Akhand" is also consonant conjunct but it is "special" in the sense that it is given priority over others. Actually, like many Latin-1 supplement and Latin Extended-A characters in Unicode, separate code points could have been assigned to all Akhands in Indic scripts. But because of improper lobbying at Unicode consortium, we couldn't make them assign separate code points for Akhand. However I am sure that in the next coming proposal our Government has proposed to include all Akhands for a separate code point in Unicode. > Any one has any idea about sorting Unicode Indic data, > esp in the context > of any database? Any Unicode aware DB out there? I'll try to gather some information on sorting order. Many database including Oracle now supports UTF-8 format of Unicode. - Keyur __________________________________________________ Do You Yahoo!? Yahoo! Greetings - Send FREE e-cards for every occasion! http://greetings.yahoo.com |