Re: [Indic-computing-standards] Re: Malayalam Half-U: how
Status: Alpha
Brought to you by:
jkoshy
From: Dr. U.B. P. <pav...@vi...> - 2002-11-12 12:39:32
|
From these discussions I can infer one thing: We need a mechanism of choosing one of the many possible display forms for a particular combination. We are having a similar requirement for Kannada for the case of "arkavattu" (reph) and "half ra". Both forms of display are possible and both are correct. I had mentioned this to the people responsible for OpenType specifications of Indic scripts. Currently they don't have any plans to do this changes. Another point I would like mention here: The sorting rule in Unicode has got nothing to do with the character code pages. They are different. Unicode has two charts -character chart and the collation table. Details of collation are available at www.unicode.org/tr10 Rgds, -Pavanaja > > In Malayalam (iso639-2 language code : mal) there are 37 > 'vyanchanangal' (consonants). All these consonants are usually > pronounced with a support of 'swaram' (vowel) sound A [U0D06]. The > pure forms of consonats is writing with a 'chandrakkala' (virama > [U0D4D]) above the consonant. While pronouncing the pure forms of > consonants there should be clear sound of vowel U [U0D09]. Some > consonants another form, which is called 'chillu'. A 'chillu' is a > consonant which do not require any vowel support to prounce. It is > writing with a vowel sign U [U0D41] and 'chandrakkala' (virama > [U0D4D]) above that. Infact Malayalam has seperate 'lipi' (script) for > 7 chillu forms of consonants which are widely using in Malayalam. > Since we have seperate scripts for most of the chillus, in writing > system we almost stopped writing chillu forms of other consonants > (which is rarely occurs) as explained above. Eventhough still you can > see some texts written in this style. Antoine said this is half form > of u that is the 'samvrutokaram' of U [U0D09] (infact 'samvrutokaram' > has a sound of A and U, so the 'virama', 'vowel sign U' and > 'combination of this two' is used in diffnerent places and texts, some > lingusits says that 'samvrutokaram' has a vowel value.) Now many are > writing consonants with virama for chillu forms of other consonats One > example is that Antoine said : U0D15 + U0D41 + U0D4D (ka, u, virama). > So internaly a chillu can be represented with unicode character > sequence like this : <consonant> + <vowel sign U [U0D41]> + <virama > [U0D4D]>. Then you can render 7 chillu forms with correct script. I > will explain how to do this below. For making inputting very easy you > can use the inscript keyboard layout standardised by kerala govt. (See > they just added chillus to original inscript keyboard layout at > appropriate positions, they considered the frequency of occurense of > this chillu forms. I will explain the drawback of this keyboard layout > below.) > > The proposal for inclusion of scripts of chillus forms of consonants > as basic characters should not be accepted by Unicode consortium. > (This is going to be submitted (or already submitted?) by Ministry of > Information Technology (Govt. of India), a member of Unicode > consortium) The prosal includeds some other things, in my opinion > those changes should be accepted. > > Now I will explain howto represent chillu forms of consonants in > unicode sequence. An important thing to be noticed is that two (or > more) consonants may have same script for their chillu forms. And its > pronouciation is also same. Though it should be represented in correct > unicode sequence. Script for chillu forms of both RA [U0D30] and RRA > [U0D31] are same. Similary script for chillu forms of both LLA [U0D33] > and LLLA [U0D34] are same. Other consonants which has chillu forms > with unique scripts are NNA [U0D23], NA [U0D28] and LA [U0D32]. > > Why 5 scripts of 7 chillus forms of consonants should not be included > in unicode ? > ---------------------------------------------------------------------- > ---------------- > > * The basic reason is that those 5 'lipi' (script) are not part of > Malayalam 'Aksharamala' > (character set). instead these are chillus only (See it is not a > 'koottaksharam' > (consonanat conjunct) ) > > Sopporting reasons :- > > + As I explained above two (or more) consonants is using same script > for > their chillu > forms. So if these 'simple shapes' are going to be part of > unicode > hard encoding of > hard encoding of chillus wll be impossible. If someone input in > correct unicode seqence > the renderer should render those characters, this will make more > problems. > > + Sorting rule cannot impliment effectively. > > Inscript keyboard layout problems :- > ------------------------------------- > > I think the drawback of new inscript keyboard layout standardised > by > Kerala govt. > will be clear from the above discussion. Eventhough the layout can be > accepted with practical consideration. Since we are only using those > scripts, we can compose any character sequence to keys allocated to > them. Here the choice is coiming in between RA [U0D30] and RRA [U0D31] > chillu and LLA [U0D33] and LLLA [U0D34]. By considering the accent of > pronounciation and freequency of occurense of these chillus, you can > choose RRA [U0D31] and LLA [U0D33]. Infact this only can be decided by > cosidering the words. For example :- RA [U0D30] + vowel sign U [U0D41] > + virama [U0D4D] is correct in words : neer - neere (water), avar - > avare (they), aar - aare (who) etc. > > and RRA [U0D31] + vowel sign U [U0D41] + virama [U0D4D] is correct in > words : car - caRe (car), kiNar - kiNaRe (well), sir - saRe (sir) etc. > > So if someone input the other correct sequences (without using those > keys), it should render properly. > > P.S : please reply to un...@un... > > Regards, > Baiju M > -- > http://baijum81.tripod.com > > > --- In unicode@y..., Antoine LECA <Antoine10646@l...> wrote: > > Hi folks, > > > > A problem was signaled in the Microsoft VOLT mailing list (this list > > should be dedicated to typographic, but it appears that it deals > > more with Indic scripts, because VOLT is the MS tool to use to > > encode OpenType informations in a font, which in turn is required to > > display Indic scripts on Windows.) > > > > The problem deals with Malayalam half-u. An user signaled as an > > error the fact that Uniscribe displays a dotted circle in the middle > > of a Malayalam half-u. He wrote > > U+0D15 U+0D41 U+0D4D (ka, u, virama) > > and Uniscribe displayed (in reformed style) the ku syllable, then a > > dotted circle, then a virama sign hanging alone. > > > > Of course, the problem is that Uniscribe expects virama to come only > > after consonants, so it displayed it as an error. But I believe the > > misunderstood hides a real problem: how can be displayed the half-u. > > Hence I am coming here to see what the gurus believe about this. > > > > To help you, I have done some researches. Here is what I have found. > > > > First, the phonetic reality: the root is when a word ends with > > halanta (virama); while in other languages, this "kills" the > > a-sound, in Malayalam it rather replaces it with the half-u sound, > > particularly when the consonant is a conjunct. This is for example > > described in the ISO 15919 standard, available with detailed > > explanations at Dr Anthony P. Stone site, > > <URL:http://homepage.ntlworld.com/stone-catend/trind.htm> > > > > According to Varamozhi (a site well informed about Malayalam), > > <URL:http://varamozhi.sourceforge.net/varamozhi-doc/varamozhi-6.html > > > when it comes to representation, there exists differing writing > > "styles" contemplating this single phonetic reality; in North > > Kerala, usage is to write the halanta sign in place, and Done! > > Obviously, this is very much in line with the other scripts. > > > > However, in South Kerala, as Mr. Cibu said, usage is to write the > > halanta sign as well as to show the matra for the u vowel. While it > > is said that this latter usage occurs with the reformed style, I > > have seen examples with the traditional style as well (although this > > is from a book printed in Madras, so it might be wrong.) Obviously, > > the user of Uniscribe intended to display this combination, which to > > him is the normal way to display a word, when it ends with halanta! > > > > Knowing that, we can now notice that Unicode has a note under > > Malayalam virama (U+0D4D), saying it is the same as Malayalam > > half-u. To me, this means that under Unicode, the half-u is supposed > > to *not* be specifically encoded, and only the usage from North > > Kerala is supposed to be followed. > > > > Other relevant informations: ISCII-91 seems mute about the subject, > > and THE CDAC products (like iLeap) seems unable to render the half-u > > in Malayalam (until one "cheats" using the INV pseudo-consonant.) > > > > It is too late to discuss the pros and cons of the choice of > > Unicode, back in 1992 (probably, they chose to ease as far as > > possible the unification of encoding, in order to ease sorting and > > similar tasks.) Now, the problem is, if someone wants to > > specifically encode the showing of the u matra, in a context (like > > is Uniscribe) where both usages from North and South Kerala could be > > intended, how should it be done? It seems rather natural to use then > > the combination > > U+0D41 U+0D4D, > > following the precedent established in Unicode 3.1 (IIRC) for the > > modern Bengali A and E initial vowels (from English borrowed words), > > which are written as Bengali A or E, followed by virama then ya > > (hence a exception to the rule virama may only follow a consonant.) > > > > Are the gurus here OK with this "solution"? > > > > Can it be "sanctified", for example with the inclusion of the > > adequate words in some revision of Unicode? > > > > > > If this is agreed, when dealing with other aspects than rendering, > > people should take in account this, and effectively ignore the > > U+0D41 when followed by U+0D4D, when the task is about searching, > > sorting, etc. While this is a nuisance, it does not appear > > completely prohibitive to me. But I admit I have not think a lot > > about the consequences of allowing such "presentation encoding." > > > > > > Regards, > > Antoine > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > Indic-computing-standards mailing list > http://indic-computing.sourceforge.net/ > Ind...@li... > https://lists.sourceforge.net/lists/listinfo/indic-computing-standards > [Other Indic-Computing mailing lists: -users, -devel, -announce] > > ----------------------------------------------------- Dr. U.B. Pavanaja Editor, Vishva Kannada World's first Internet magazine in Kannada http://www.vishvakannada.com/ Note: I don't worry about pselling mixtakes |