Re: [Indic-computing-devel] Script specific features

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

--- Rajkumar S <s_...@my...> wrote:
> On Mon, 25 Feb 2002, Keyur Shroff wrote:
> 
> > For example, Malayalam has "Chillaksharam"

> My first impression with
> "Chillaksharam" was that It is a unique feature of
> Malayalam and it will
> require some modifications in Unicode stds to accommodate
> them.

Our Ministry has prepared new proposal for introduction of
new Indic characters in next version of Unicode standard.
For that the Ministry is gathering information on various
scripts from various Language Resource Centres in India for
those scripts. Tomorrow someone from Kerala (don't remember
his name) called me up at NCST and asked about this
"Chillaksharam" problem. It is true that "Chillaksharam"
form can be produced using Zero-Width-Joiner (ZWJ) and
Zero-Width-NonJoiner (ZWNJ).

> > Devanagari has "Akhand",
> 
> "Akhand" feature is confusing for me, I am having a
> dialogue with various
> people here (I am a native Malayalam speaker) and  Apurva
> about this. So
> far I have never come across this feature in Malayalam
> grammar, but it
> seems that the two Akhand ligatures get priority over
> other ligatures when
> rendering. The way I tested this by asking this question
> 
> Given a font that has a glyph each for the conjuncts:
> KaKa, Kssa. Given
> that the font contains a lookup with the following
> substitution rule:
> 
> Ka Halant Ka -> KaKa
> Ka Halant Ssa -> Kssa
> Ja Halant Nya -> Dnya
> Nya Halant Nya -> Nnya
> 
> Now given a theoretical sequences: Ka Halant Ka Halant
> Ssa Halant Ma and
> Ja Halant Nya Halant Nya Halant Ma,  How will you render
> them.
> 
> All of them answered that they will give priority for
> Akhand. But later
> when I explained the concept of Akhand they were
> surprised. But even now
> I don't know if their is any linguistic basis for
> clustering priorities.

I also discussed with Apurva Joshi @ Microsoft about
applying features. She is daughter of Prof. R.K.Joshi who
is working here at NCST in font design area. Raghu font has
been designed by Prof. R.K.Joshi.

Here I am quoting Apurva's message

<quote>

All the akhands I have come across so far in Indic scripts
are made up of two consonants. They are thus essentially
treated as conjuncts, not consonants. And, more
importantly, they have an additional status of being
processed first in any given input sequence. 

</quote>

Thus, "Akhand" is also consonant conjunct but it is
"special" in the sense that it is given priority over
others. Actually, like many Latin-1 supplement and Latin
Extended-A characters in Unicode, separate code points
could have been assigned to all Akhands in Indic scripts.
But because of improper lobbying at Unicode consortium, we
couldn't make them assign separate code points for Akhand.
However I am sure that in the next coming proposal our
Government has proposed to include all Akhands for a
separate code point in Unicode.

> Any one has any idea about sorting Unicode Indic data,
> esp in the context
> of any database? Any Unicode aware DB out there?

I'll try to gather some information on sorting order. Many
database including Oracle now supports UTF-8 format of
Unicode.

- Keyur

__________________________________________________
Do You Yahoo!?
Yahoo! Greetings - Send FREE e-cards for every occasion!
http://greetings.yahoo.com