Re: [Indic-computing-devel] Script specific features
Status: Alpha
Brought to you by:
jkoshy
From: Rajkumar S <s_...@my...> - 2002-02-27 07:59:03
|
On Mon, 25 Feb 2002, Keyur Shroff wrote: > For example, Malayalam has "Chillaksharam" I have been experimenting with the IndiX patch for Malayalam support and has had some initial success with it. My first impression with "Chillaksharam" was that It is a unique feature of Malayalam and it will require some modifications in Unicode stds to accommodate them. But later Apurva Joshi <ap...@mi...>, who deals with Indic scripts in MS clarified that <quote> I assume that "chillu form" above means the "chillaksharam form" that only a few consonants in Malayalam take. If so, the following explanation is how this form is currently implemented: If the last consonant in a Malayalam word is capable of forming a chillaksharam, and it is followed by a Halant/Virama followed by a word delimiter [in most Indic scripts this is the space]; this sequence is displayed as consonant Halant. Thus: Kha Ka Halant is displayed as Kha Ka_Chillaksharam. In the above case if you would like to convert the chillaksharam to its consonant+Halant form you need to insert a ZWJ after the Halant; thus: Kha Ka Halant ZWJ This will display as Kha Ka Halant. And for input sequences like those given below, where the consonant capable of forming a chillaksharam is not the last consonant in a syllable, the following is done: Kha Na Halant Kha; the final display will be Kha Na Halant Kha. If the Na Halant, in the above case, which does not occur at the end of the word; needs to be retained as the chillaksharam form, you need to insert a ZWNJ thus: Kha Na Halant ZWNJ Kha; this will display as Kha Na_chillaksharam Kha. </quote> > Devanagari has "Akhand", "Akhand" feature is confusing for me, I am having a dialogue with various people here (I am a native Malayalam speaker) and Apurva about this. So far I have never come across this feature in Malayalam grammar, but it seems that the two Akhand ligatures get priority over other ligatures when rendering. The way I tested this by asking this question Given a font that has a glyph each for the conjuncts: KaKa, Kssa. Given that the font contains a lookup with the following substitution rule: Ka Halant Ka -> KaKa Ka Halant Ssa -> Kssa Ja Halant Nya -> Dnya Nya Halant Nya -> Nnya Now given a theoretical sequences: Ka Halant Ka Halant Ssa Halant Ma and Ja Halant Nya Halant Nya Halant Ma, How will you render them. All of them answered that they will give priority for Akhand. But later when I explained the concept of Akhand they were surprised. But even now I don't know if their is any linguistic basis for clustering priorities. > Tamil has "two-side split matra", etc. Detailed discussion of these > features is required. For Malayalam I take this as, U+0D4A, 4B and 4C. From what I understand these have to be split into the corresponding component marks, ie 0D15 0D4A is first split into 0D15 0D46 0D3E It is then reordered to 0D46 0D15 0D3E for rendering. The Tamil section of Unicode std gives more information about this. Any one has any idea about sorting Unicode Indic data, esp in the context of any database? Any Unicode aware DB out there? raj |