[Indic-computing-standards] Re: [Indic-computing-users] Why Unicode Won't Work on the Internet - Par
Status: Alpha
Brought to you by:
jkoshy
From: Keyur S. <key...@ya...> - 2003-01-21 06:37:39
|
--- "ma...@ch..." <ma...@ch...> wrote: > > Analysis > The problems perceived with the Tamil Unicode segment are as follows. > --------------------------------------------------------------------- > > Character Sequence > The primary problem is that the sequence of characters does not match the > traditional Tamil ordering method. This is the reason many Tamil scholars > have requested a reorganization of the Tamil segment. However since some > softwares including the popular Windows 2000 have already implemented > Unicode in its current form, it will not be possible now to change the > Tamil segment entirely. This is one of the common misunderstanding about Unicode. In fact, the Unicode standard DO NOT (and CAN NOT) provide default sort order because sort order is specific to a language and not to any script. For example, sort order for characters in Marathi and Hindi differ even though both the languages fall under the same script 'Devanagari'. One must understand the difference between collation order and encoding order. It is not possible to change the order of characters in Unicode character chart. Also, it is not desirable. Standards are not meant to change very often. Unicode standard has changed character orders only once during its life - when it was decided to go together with ISO 10646 standard. But, at that time the Unicode was in nascent stage. Changing the order of any characters now will create problems for so many applications, not only MS platform. For detailed explanation, you are requested to read the following paper: http://www.microsoft.com/middleeast/msdn/Indic_collation-DC.pdf > > The actual ordering of characters in unicode based databases happen by > specifying a collation sequence. Unicode has detailed documents on > writing > these collation sequences. A document needs to be prepared in accordance > with Unicode standards for the correct Tamil ordering sequence (tamiz > neTunkaNakku). If it is not ready then Tamil community should come forward and help other people to define correct sorting order for characters. > ------------------------------------------------------------------------- > > Missing Characters > Some characters in Tamil script are missing in Unicode. These are not > native to Tamil language, but are often used in Tamil documents. > Significant among them are, SRI, KSHA and Tamil numeral ZERO. If these characters are really needed in the Tamil script, then they can be added. Inclusion of any character in Unicode chart must go through standard procedure. It includes proposal, justification, evidence, and support from linguists and/or state government. But, if the character is not prime representative of the script and if the characters with the similar semantic has already been defined elsewhere in the Unicode, then it is recommeded to reuse the character instead of duplicating. However, there are some exceptions. Popularity of a particular encoding is also considered sometimes. Unicode could have been designed in a better way with some unified approach for encoding but in order to maintain compatibility with all popular encodings and other existing standards, it was decided to reuse those standards in Unicode. > > While zero is not used in the original Tamil number system, it is often > used in present day documents while writing numbers in Tamil numerals > using > international number system. It is noted [1] that Mauritius currency uses > Tamil numberals with zero in their currencies. This character should be > added at U+0BE6. > > KSHA and SRI are grantha characters not native to Tamil language. However > they are being used in Tamil script for over six centuries [2]. > Devanagari > and many other indian languages consider these characters as ligatures > (combination of several characters). These are considered separate > characters in Tamil, not ligatures. Thus including these characters in > Tamil Unicode range makes good sense and will eliminate complex ligature > handling for Tamil. (Possible locations: KSHA = U+0BBA; SRI = U+0BF3). With experience we have learnt that complexity of ligatures can be handled easily in either application or font. To give unique codepoint to all ligatures is not considered a good idea when there are alternatives available. If you encode each ligature separately then in fact you are at lost of beautiful canonical structures which is unique property of Indic scripts. > > Another character often requested in Tamil Unicode range is OM. > Devanagari > and Gujarati ranges in Unicode have a separate character for OM at U+0950 > and U+0ADO. If there is a consensus on adding this character, it can be > added at U+0BD0. > > There have been some suggestions for including Rupee sign, and Tamil > fraction symbols in Unicode. These characters are not in popular use. > These may need to be considered only when they come into use. If there is consensus for inclusion of these characters in Tamil script, then proposal can be submitted to Unicode. > ----------------------------------------------------------------------- > > Incorrect Characters > Tamil Unicode range has a character for TAMIL SIGN ANUSVARA at U+0B82. > This > character does not exist in native Tamil script. Its existence in grantha > script needs to be examined. This character is sometimes mistaken for > Tamil > oRRu which is at location U+0BCD (TAMIL SIGN VIRAMA). It would be better > if this character is deprecated from Unicode standard. Since characters in Unicode standard can not be removed once they are defined, their use can be deprecated. > > The character TAMIL SIGN VISARGA (Aytam) at U+0B83 is incorrectly > classified as a modifier in Unicode while it is considered as a character > in Tamil script. This error needs to be corrected. The glyph that > represents this character should be changed to remove the character place > holder (dotted circle). AFAIK this change has been made in Unicode version 3.1. - Keyur __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |